What calibration means
A model is calibrated when its predicted probabilities match observed frequencies. Among cases it labels point eight likely, about eighty percent should truly be positive. Calibration is separate from accuracy; a model can rank well yet report misleading confidence.
Reading the curve
The calibration curve, or reliability diagram, bins predictions by their probability and plots predicted probability against the actual positive rate in each bin.
- On the diagonal means well calibrated.
- Below the diagonal means overconfident, predicting higher than reality.
- Above the diagonal means underconfident.
Fixing miscalibration
Many strong classifiers are poorly calibrated out of the box, especially boosted trees and deep networks. Two common repairs fit a small adjustment on held out data:
- Platt scaling fits a logistic function to the scores.
- Isotonic regression fits a flexible monotonic mapping when more data is available.
These leave the ranking intact while pulling probabilities toward honesty.
Key idea
A calibration curve compares predicted probabilities to observed frequencies, and Platt scaling or isotonic regression can correct overconfident or underconfident models without changing their ranking.