← Lessons

quiz vs the machine

Gold1440

Machine Learning

The INT8 Calibration

Choosing the right scale by observing real activation ranges.

5 min read · core · beat Gold to climb

The scale problem

To quantize activations to 8 bit integers you must pick a scale that maps real values onto only 256 levels. Pick it too wide and you waste levels on rare extremes; too narrow and you clip common values. Calibration finds a good scale by watching real data.

How calibration works

A small representative calibration dataset is run through the model while the quantizer records the distribution of activations at each layer.

  • Min max uses the observed extremes as the range.
  • Percentile clips a tiny fraction of outliers to spend levels on the bulk.
  • KL divergence chooses the range that best preserves the activation distribution.

The procedure

Why outliers matter

A few large activations can stretch the range so much that ordinary values collapse into a handful of levels. Clipping those outliers, accepting a little error on them, often improves overall accuracy. The calibration set should resemble production data so the chosen ranges generalize.

Key idea

INT8 calibration runs representative data to measure activation ranges and pick per layer scales that balance clipping against wasted resolution.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the purpose of a calibration dataset?

2. Why might clipping outliers improve accuracy?