Privacy Preserving ML

The privacy problem

Models trained on personal data can leak it. A model may memorize rare records, and attackers can sometimes recover training examples or test whether a specific person was in the data. Privacy preserving ML aims to learn useful patterns while limiting what any individual reveals.

Threats to guard against

Membership inference: deciding whether a given record was in the training set.
Model inversion: reconstructing sensitive features from model behavior.
Memorization leakage: a generative model reciting verbatim training text.

The main toolbox

Differential privacy: add calibrated noise so no single record changes the output much.
Federated learning: keep raw data on devices and only share model updates.
Secure computation: compute on encrypted data so the server never sees raw values.

The core tradeoff

Stronger privacy usually means more noise or coordination cost, which can lower accuracy. The goal is to bound individual exposure while keeping aggregate utility.

Key idea

Privacy preserving ML defends against membership inference, inversion, and memorization by using differential privacy, federated learning, and secure computation, trading some accuracy to bound any individual exposure.

Privacy Preserving ML

The privacy problem

Threats to guard against

The main toolbox

The core tradeoff

Key idea

Check yourself