Differential privacy [1/n]
Differential privacy is mathematically rigorous definition of "data privacy" (received the Gödel Prize in 2017).
Definition:
Let Zᵐ be a space of datasets with m rows.
Let d(x, x') be the number of rows at which x, x' ∈ Zᵐ differ.
Let M be a mechanism that takes a dataset x ∈ Zᵐ as input and outputs a random variable M(x) ∈ Y.
M is called (ε, δ)-differentially private if
P(M(x) ∈ S) ≤ exp(ε) P(M(x') ∈ S) + δ,
∀ x, x' ∈ Zᵐ with d(x, x') = 1, ∀ measurable S ∈ Y.
Differential privacy [3/n]
Another condition for differential privacy involves the tail probabilities of the "privacy loss":
Privacy loss := log( P(M(x) = ξ) / P(M(x') = ξ) ),
where x, x' ∈ Zᵐ have d(x, x') = 1, and ξ follows the distribution of M(x).
This holds:
M is (ε, δ)-differentially private if
P(privacy loss ≤ ε) ≥ 1 - δ.
However I find it hard to grasp the intuition for this result... need to think more about it...
[notice that E(privacy loss) = KL-divergence btwn M(x) and M(x')]
Differential privacy [5/n]
A crucial property of differential privacy (d.p.) is that it is preserved under post-processing and adaptive composition in the following sense:
If M₁, M₂, ..., Mₘ are (ε₁, δ₁)-d.p., (ε₂, δ₂)-d.p., ..., (εₘ, δₘ)-d.p respectively, then any function of (M₁, M₂, ..., Mₘ) is (∑ εᵢ, ∑ δᵢ)-d.p.
Whereby the choice of Mᵢ₊₁ may depend on the output of M₁, ..., Mᵢ. Hence *adaptive* composition.
Differential privacy [4/n]
What does a differentially private data access mechanism look like?
The simplest example is the Laplace Mechanism:
Let Zᵐ be a space of datasets with m rows. Let Lap(b) denote the Laplace distribution with scale parameter b>0.
Assume that we want to evaluate a function f: Zᵐ → R on a dataset x ∈ Zᵐ.
The Laplace Mechanism, given by
M(x) = f(x) + Lap(b),
is (ε, 0)-differentially private (or simply ε-d.p.) if
b > sup { |f(x) - f(x')| : d(x, x') = 1 } / ε.