Many users are clamouring for privacy techniques so that they can publish, share or sell their data safely. There are many aspects of data that then have to be managed: the data is often sparse, multi-dimensional, and structured. Each of these requires mechanisms that are aware of these features, so that they provide output which is useful and efficient to compute, in addition to providing a privacy guarantee. The state-of-the-art goal for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. This talk outlines some recent efforts to generate practical mechanisms for high-dimensional data release. The key insight is the need to build a compact model for the data which accurately represents its structure, while remaining resilient to the noise introduced to preserve privacy. Models based on Bayesian networks turn out to be particularly effective for this task. However, private construction of Bayesian networks turns out to be significantly challenging. This leads to the introduction of leading to PrivBayes, a novel approach to identify correlations among attributes and release suitable marginals while ensuring differential privacy. Experimental evaluations of PrivBayes on real data, demonstrate that it significantly outperforms existing solutions in terms of accuracy.
[ bib ] Back
This file was generated by bibtex2html 1.92.