← Lessons

quiz vs the machine

Gold1450

Machine Learning

Product Quantization

Compressing vectors into tiny codes so millions fit in memory.

5 min read · core · beat Gold to climb

The memory problem

A single embedding can be hundreds of floating point numbers. Store millions of them and memory explodes. Product quantization, or PQ, compresses each vector into a small code, often a handful of bytes, so huge collections fit in RAM.

How it works

PQ splits a vector into several subvectors. For each subvector position it learns a small codebook of representative points. Each subvector is then replaced by the id of its nearest codebook entry, so the whole vector becomes a short list of ids.

Searching with codes

Distances are estimated from the codes using precomputed lookup tables, which is far cheaper than working with full vectors. The result is approximate, since the code only stores a representative point, not the exact value.

The tradeoff

  • More subvectors and larger codebooks mean better accuracy but bigger codes.
  • Aggressive compression saves memory but lowers recall.

PQ is often combined with IVF, where IVF narrows the candidates and PQ keeps them tiny.

Key idea

Product quantization splits vectors, replaces each part with a codebook id, and estimates distances from these small codes, trading accuracy for large memory savings.

Check yourself

Answer to earn rating on the learn ladder.

1. What does product quantization store instead of the full vector?

2. What is the main benefit of product quantization?