Probabilities that mean something
A model can rank examples well yet still output probabilities that are wrong as numbers. Calibration asks whether predicted probabilities match observed frequencies. If a model says seventy percent for a group, about seventy percent of that group should be positive.
Building the curve
To draw a calibration curve you bin predictions by their predicted probability, then plot the average predicted probability against the actual fraction of positives in each bin.
- A perfectly calibrated model lies on the diagonal.
- A curve below the diagonal means the model is overconfident.
- A curve above the diagonal means the model is underconfident.
Fixing miscalibration
- Platt scaling fits a logistic function on a held out set.
- Isotonic regression fits a flexible nondecreasing mapping.
- Always calibrate using data the model never trained on.
Key idea
Calibration checks whether predicted probabilities match real frequencies. A calibration curve reveals over or under confidence, and methods like Platt scaling or isotonic regression can correct it.