← Lessons

quiz vs the machine

Gold1410

Machine Learning

The Quantization Aware Training

Simulating low precision during training so the final quantized model stays accurate.

5 min read · core · beat Gold to climb

Why post training quantization can hurt

Quantizing a trained model to low precision after the fact rounds every weight and activation. For aggressive bit widths the rounding error can noticeably drop accuracy because the model never adapted to it.

The QAT idea

Quantization aware training inserts fake quantization into the forward pass during training.

  • Weights and activations are rounded to the target precision in the forward pass.
  • The model sees the rounding error and learns to be robust to it.
  • Backward gradients flow through using a straight through estimator since rounding has zero gradient.

Key mechanics

  • The straight through estimator passes gradients as if quantization were the identity.
  • Quantization ranges are calibrated or learned so the scale matches the data.
  • Training often starts from a full precision checkpoint and fine tunes with fake quantization.

Trade offs

QAT recovers most of the lost accuracy at low bit widths but costs extra training compute and complexity. When post training quantization already meets the accuracy target, the simpler path is preferred.

Key idea

Quantization aware training simulates low precision in the forward pass with fake quantization and a straight through estimator, so the model adapts and the final quantized model keeps accuracy.

Check yourself

Answer to earn rating on the learn ladder.

1. What does quantization aware training insert into the forward pass?

2. Why is a straight through estimator needed?

3. When is plain post training quantization preferred over QAT?