GPTQ and AWQ Quantization

Smarter post training quantization methods that protect the weights that matter most.

Beyond naive rounding

Rounding every weight to its nearest integer ignores how much each weight affects outputs. GPTQ and AWQ are post training methods that quantize a finished model more carefully using a small calibration dataset, so quality holds up even at 4 bits.

GPTQ

GPTQ quantizes weights column by column and compensates for error as it goes. After rounding one group, it adjusts the remaining unquantized weights to absorb the introduced error, using curvature information from the calibration data. This keeps the layer output close to the original.

AWQ

AWQ stands for activation aware weight quantization. It observes that a few weight channels carry large activations and matter far more. Instead of quantizing all channels equally, AWQ scales important channels before quantizing so their precision is protected, leaving the rest compact.

Why they help

Both run after training, needing no full retraining.
Both use calibration data to measure what matters.
Both make 4 bit models far more usable than naive rounding.

Key idea

GPTQ compensates quantization error across weights while AWQ protects the channels with large activations, both using calibration data to keep 4 bit models accurate.

GPTQ and AWQ Quantization

Beyond naive rounding

GPTQ

AWQ

Why they help

Key idea

Check yourself