← Lessons

quiz vs the machine

Platinum1810

Machine Learning

Quantization Aware Training

Simulate low bit math during training so the model learns to tolerate it.

5 min read · advanced · beat Platinum to climb

Why it exists

Post training quantization is easy but can lose accuracy at very low bit widths. Quantization aware training, or QAT, instead simulates quantization during training so the network adapts its weights to the rounding error before deployment.

Fake quantization

QAT inserts fake quantize operations into the graph. In the forward pass, weights and activations are rounded to the target precision, so the model sees the same coarse values it will use at inference. The numbers stay in floating point internally so the loss reflects the quantized behavior.

The gradient problem

Rounding has a derivative of zero almost everywhere, which would block learning. QAT uses the straight through estimator: in the backward pass it pretends the rounding step was the identity, passing the gradient through unchanged. This lets the optimizer keep adjusting the float weights even though the forward pass is quantized.

Cost and payoff

QAT needs a full training or fine tuning run and is more complex than PTQ. The payoff is markedly better accuracy at low precision, which matters for four bit or aggressive eight bit deployment.

Key idea

QAT inserts fake quantization in the forward pass and uses a straight through estimator in the backward pass, so the model learns to tolerate low bit math and keeps more accuracy than post training quantization.

Check yourself

Answer to earn rating on the learn ladder.

1. How does QAT differ from post training quantization?

2. Why is the straight through estimator needed in QAT?

3. What is the main cost of QAT compared to PTQ?