Scenario
- Researcher wants to estimate percentage of HIV positive individuals.
- Survey respondents don’t want to disclose HIV status
DP Solution
Each respondent flips answer with probability \(p\).
Exercise: show that respondent privacy is protected
Exercise: show that researcher can accurately estimate the quantity of interest
Let \(x_i = 1\) if the \(i\)‘th respondent is HIV positive, and zero otherwise. We’re interested in \( S/N \), where \(S = \sum_{i=1}^{n}{x_i} \).
$$ \mathbb{E}[S] = \sum_{i=1}^{n}{ (1 - p) \cdot x_i + p \cdot (|x_i - 1|) } $$
Therefore:
$$ \mathbb{E}[S] \ge \sum_{i=1}^{n}{ ((1 - p) \cdot x_i - p \cdot (x_i - 1)) } $$ $$ = \sum_{i=1}^{n}{ (x_i - 2px_i + p) } $$ $$ = pn + (1 - 2p) \sum_{i=1}^{n}{x_i} $$
and:
$$ \mathbb{E}[S] \le \sum_{i=1}^{n}((1-p)x_i + p(x_i - 1)) $$ $$ = \sum_{i=1}^{n}(x_i - px_i + px_i - p) $$ $$ = \sum_{i=1}^{n} (x_i - p) $$ $$ = pn + \sum_{i=1}^{n}x_i $$
Not sure how this is supposed to help me prove anything. Each \(x_i\) has its own probability of being 1 or 0, which makes simplifying \(\sum x_i\) difficult.