Building Intuition About DP

Dated Jan 22, 2019; last modified on Sun, 14 Mar 2021

Scenario

  • Researcher wants to estimate percentage of HIV positive individuals.
  • Survey respondents don’t want to disclose HIV status

DP Solution

Each respondent flips answer with probability \(p\).

Exercise: show that respondent privacy is protected

Exercise: show that researcher can accurately estimate the quantity of interest

Let \(x_i = 1\) if the \(i\)‘th respondent is HIV positive, and zero otherwise. We’re interested in \( S/N \), where \(S = \sum_{i=1}^{n}{x_i} \).

$$ \mathbb{E}[S] = \sum_{i=1}^{n}{ (1 - p) \cdot x_i + p \cdot (|x_i - 1|) } $$

Therefore:

$$ \mathbb{E}[S] \ge \sum_{i=1}^{n}{ ((1 - p) \cdot x_i - p \cdot (x_i - 1)) } $$ $$ = \sum_{i=1}^{n}{ (x_i - 2px_i + p) } $$ $$ = pn + (1 - 2p) \sum_{i=1}^{n}{x_i} $$

and:

$$ \mathbb{E}[S] \le \sum_{i=1}^{n}((1-p)x_i + p(x_i - 1)) $$ $$ = \sum_{i=1}^{n}(x_i - px_i + px_i - p) $$ $$ = \sum_{i=1}^{n} (x_i - p) $$ $$ = pn + \sum_{i=1}^{n}x_i $$

Not sure how this is supposed to help me prove anything. Each \(x_i\) has its own probability of being 1 or 0, which makes simplifying \(\sum x_i\) difficult.