← Lessons

quiz vs the machine

Silver1120

Machine Learning

GPTQ and AWQ Quantization

Smarter post training quantization methods that protect the weights that matter most.

5 min read · intro · beat Silver to climb

Beyond naive rounding

Rounding every weight to its nearest integer ignores how much each weight affects outputs. GPTQ and AWQ are post training methods that quantize a finished model more carefully using a small calibration dataset, so quality holds up even at 4 bits.

GPTQ

GPTQ quantizes weights column by column and compensates for error as it goes. After rounding one group, it adjusts the remaining unquantized weights to absorb the introduced error, using curvature information from the calibration data. This keeps the layer output close to the original.

AWQ

AWQ stands for activation aware weight quantization. It observes that a few weight channels carry large activations and matter far more. Instead of quantizing all channels equally, AWQ scales important channels before quantizing so their precision is protected, leaving the rest compact.

Why they help

  • Both run after training, needing no full retraining.
  • Both use calibration data to measure what matters.
  • Both make 4 bit models far more usable than naive rounding.

Key idea

GPTQ compensates quantization error across weights while AWQ protects the channels with large activations, both using calibration data to keep 4 bit models accurate.

Check yourself

Answer to earn rating on the learn ladder.

1. What does AWQ pay special attention to?

2. How does GPTQ keep accuracy high?

3. What do both methods require?