Model Calibration

What calibration means

A model is calibrated when its confidence matches reality. Among all predictions made with seventy percent confidence, about seventy percent should be correct. A model can be highly accurate yet poorly calibrated, reporting ninety nine percent confidence when it is right only eighty percent of the time.

Why it matters

Probabilities feed downstream decisions, risk thresholds, and human trust. Overconfident outputs lead to bad automated choices and miscalibrated risk. Modern deep networks tend to be overconfident by default.

Measuring it

A reliability diagram plots predicted confidence against observed accuracy
The expected calibration error averages the gap between confidence and accuracy across bins
A perfectly calibrated model lies on the diagonal

Fixing it

Calibration is usually a cheap post processing step fit on held out data.

Temperature scaling divides the logits by a single learned temperature, softening or sharpening the probabilities without changing the predicted class
Platt scaling fits a logistic transform to the scores
Isotonic regression fits a flexible monotonic mapping

Temperature scaling is the most popular because it is simple, leaves accuracy untouched, and only needs one parameter tuned on a validation set.

Key idea

Calibration aligns predicted confidence with real accuracy, often via cheap post hoc methods like temperature scaling.

What calibration means

Why it matters

Measuring it

Fixing it

Key idea

Check yourself