Dated Jan 22, 2019;
last modified on Sun, 14 Mar 2021
A data release mechanism is differentially private if any computation performed on the released data yields essentially the same result if a single data point is added/removed.
More specifically, let \(Q\) be any (probabilistic) query function. \(Q\) is \(\epsilon\)-differentially private if for all databases \(B\), \(B'\) that differ in one item, and for all functions \(F\), and all values \(y\):
The goal is to make large datasets safe, i.e. no harm to you from having your data included.
Sanitize and Transfer Model Remove all PII, _e.g. name, SSN, mobile number, etc, before releasing the dataset.
However, the list of PII is not necessarily complete. Furthermore, combinations of seemingly non-PII data can be jointly identifying.
Using an auxilliary dataset is a common re-identification method, e.g. Narayanan and Shmatikov [2010] combined IMDB ratings and comments to deanonymize a Netflix dataset....
Myths and Fallacies of 'Personally Identifiable Information'. Narayanan, Arvind; Shmatikov, Vitaly. dl.acm.org . Jun 1, 2010. What is PII? From Breach Notification Laws: For example, California Senate Bill 1386: SSNs, driver’s license numbers, financial accounts.
The list can never be exhaustive, e.g. email addresses and telephone numbers are not mentioned in Bill 1386.
Focuses on data that are commonly used for authenticating an individual. Ignores data that reveals some sensitive information about an individual...
Differential Privacy: A Primer for a Non-technical Audience. Kobbi Nissim; Thomas Steinke; Alexandra Wood; Micah Altman; Aaron Bembenek; Mark Bun; Marco Gaboardi; David R. O'Brien; Salil Vadhan. www.ftc.gov . May 7, 2017. What Does DP Guarantee? It is a question of whether a particular computation (not output) preserves privacy.
DP only guarantees that no information specific to an individual is revealed by the computation. DP doesn’t protect against information that could be learned even with an individual opting out of a dataset, e....
Cynthia Dwork, the pioneer of Differential Privacy, has just won the 2020 Knuth Prize .