Calibration Curves

Probabilities that mean something

A model can rank examples well yet still output probabilities that are wrong as numbers. Calibration asks whether predicted probabilities match observed frequencies. If a model says seventy percent for a group, about seventy percent of that group should be positive.

Building the curve

To draw a calibration curve you bin predictions by their predicted probability, then plot the average predicted probability against the actual fraction of positives in each bin.

A perfectly calibrated model lies on the diagonal.
A curve below the diagonal means the model is overconfident.
A curve above the diagonal means the model is underconfident.

Fixing miscalibration

Platt scaling fits a logistic function on a held out set.
Isotonic regression fits a flexible nondecreasing mapping.
Always calibrate using data the model never trained on.

Key idea

Calibration checks whether predicted probabilities match real frequencies. A calibration curve reveals over or under confidence, and methods like Platt scaling or isotonic regression can correct it.

Calibration Curves

Probabilities that mean something

Building the curve

Fixing miscalibration

Key idea

Check yourself