Beyond naive rounding
Rounding every weight to its nearest integer ignores how much each weight affects outputs. GPTQ and AWQ are post training methods that quantize a finished model more carefully using a small calibration dataset, so quality holds up even at 4 bits.
GPTQ
GPTQ quantizes weights column by column and compensates for error as it goes. After rounding one group, it adjusts the remaining unquantized weights to absorb the introduced error, using curvature information from the calibration data. This keeps the layer output close to the original.
AWQ
AWQ stands for activation aware weight quantization. It observes that a few weight channels carry large activations and matter far more. Instead of quantizing all channels equally, AWQ scales important channels before quantizing so their precision is protected, leaving the rest compact.
Why they help
- Both run after training, needing no full retraining.
- Both use calibration data to measure what matters.
- Both make 4 bit models far more usable than naive rounding.
Key idea
GPTQ compensates quantization error across weights while AWQ protects the channels with large activations, both using calibration data to keep 4 bit models accurate.