The last decade has seen much effort directed to addressing data anonymization: the problem of publishing a modified version of a dataset so that inference about individuals in the data set is limited. As with cryptography, progress has been made by new methods being proposed and their weaknesses being identified, leading to improved algorithms.
The simplest anonymity definition is k-anonymity: every individual in the data set should be indistinguishable from k-1 others. However, if there is homogoneity among this set of individuals then undesired inference is possible. To counter this, definitions such as l-diversity and t-closeness were advanced. Recently it was shown that over eager attempts to enforce these definitions can itself allow unwanted inferences to be drawn. This “minimality attack” has stood for several years. But recently it has been shown that this attack is less powerful than first thought: appropriately designed algorithms are invulnerable, and even in the most pessimistic case, the power of the attack can be bounded by a multiplicative factor of e (=2.78...).
In this talk, I will introduce these approaches to anonymization from first principles, working through measure and counter-measure, ending with our recent analysis of the minimality attack.
[ bib | slides ] Back
This file was generated by bibtex2html 1.92.