The Quantization Aware Training

Simulating low precision during training so the final quantized model stays accurate.

Why post training quantization can hurt

Quantizing a trained model to low precision after the fact rounds every weight and activation. For aggressive bit widths the rounding error can noticeably drop accuracy because the model never adapted to it.

The QAT idea

Quantization aware training inserts fake quantization into the forward pass during training.

Weights and activations are rounded to the target precision in the forward pass.
The model sees the rounding error and learns to be robust to it.
Backward gradients flow through using a straight through estimator since rounding has zero gradient.

Key mechanics

The straight through estimator passes gradients as if quantization were the identity.
Quantization ranges are calibrated or learned so the scale matches the data.
Training often starts from a full precision checkpoint and fine tunes with fake quantization.

Trade offs

QAT recovers most of the lost accuracy at low bit widths but costs extra training compute and complexity. When post training quantization already meets the accuracy target, the simpler path is preferred.

Key idea