← Lessons

quiz vs the machine

Gold1430

Machine Learning

Post Training Quantization

Convert a trained float model to low bit integers without retraining.

5 min read · core · beat Gold to climb

What it is

Post training quantization, or PTQ, takes an already trained floating point model and converts its weights, and often activations, to low bit integers such as eight bit. Integer math is faster and uses less memory, so the model runs cheaper on the same hardware.

How values are mapped

Quantization maps a float range to integers with a scale and a zero point. A value is divided by the scale, rounded, and offset by the zero point. The challenge is picking a range that captures most values without wasting bits on rare outliers.

  • Per tensor uses one scale for a whole tensor, simplest but coarse.
  • Per channel uses a separate scale per output channel, which handles uneven weight ranges far better.

Static versus dynamic

  • Dynamic quantization computes activation scales on the fly at inference time.
  • Static quantization runs a small calibration set through the model first to record activation ranges, giving better accuracy.

PTQ is attractive because it needs no labels and no retraining, but very low bit widths can lose noticeable accuracy.

Key idea

Post training quantization converts a trained float model to low bit integers using scales and zero points; calibration and per channel scaling preserve accuracy without any retraining.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the key appeal of post training quantization?

2. Why does per channel quantization often beat per tensor?