The privacy problem
Models trained on personal data can leak it. A model may memorize rare records, and attackers can sometimes recover training examples or test whether a specific person was in the data. Privacy preserving ML aims to learn useful patterns while limiting what any individual reveals.
Threats to guard against
- Membership inference: deciding whether a given record was in the training set.
- Model inversion: reconstructing sensitive features from model behavior.
- Memorization leakage: a generative model reciting verbatim training text.
The main toolbox
- Differential privacy: add calibrated noise so no single record changes the output much.
- Federated learning: keep raw data on devices and only share model updates.
- Secure computation: compute on encrypted data so the server never sees raw values.
The core tradeoff
Stronger privacy usually means more noise or coordination cost, which can lower accuracy. The goal is to bound individual exposure while keeping aggregate utility.
Key idea
Privacy preserving ML defends against membership inference, inversion, and memorization by using differential privacy, federated learning, and secure computation, trading some accuracy to bound any individual exposure.