Privacy and Differential Privacy Basics

Models trained on personal data can leak it. A model might memorize a unique record and reveal it later. Privacy techniques aim to learn patterns without exposing any one person.

The core idea

Differential privacy gives a formal guarantee: the output of an analysis should be almost the same whether or not any single individual is in the dataset. If one person's presence barely changes the result, an attacker cannot confidently tell whether that person was included.

How it is achieved

Calibrated noise is added to results or gradients so individual contributions blur.
A parameter called the privacy budget, often written epsilon, controls the tradeoff. A smaller budget means stronger privacy and more noise.
In training, noisy gradient methods clip and perturb each example's contribution.

The tradeoff

Privacy is not free. More noise means a guarantee that is stronger but a model that is less accurate. Teams choose a budget that balances protection against utility. The budget also accumulates, so repeated queries on the same data spend privacy and must be tracked.

Why it matters

Beyond regulation, differential privacy lets organizations share insights and train on sensitive data with a mathematically defensible promise rather than vague assurances.

Key idea

Differential privacy adds calibrated noise so any single individual barely affects the output, trading accuracy for a formal privacy guarantee tracked by a budget.

Privacy and Differential Privacy Basics