The Recall vs Latency Tradeoff

One curve, many knobs

Vector search lives on a tradeoff between recall, how many true neighbors you find, and latency, how long each query takes. Almost every tuning knob moves you along this curve rather than escaping it.

Knobs that push the balance

nprobe in IVF: more buckets searched means higher recall and higher latency.
ef search in HNSW: a larger candidate list improves recall but costs time.
Quantization level: stronger compression speeds search but can lower recall.

How to choose a point

Start from a target. If the product needs results in under fifty milliseconds, tune knobs until you hit the best recall within that budget. If recall must exceed ninety five percent, accept the latency that requires.

A common mistake

Reporting recall without latency, or latency without recall, hides the tradeoff. Always state both, since a method is only better if it beats another at the same point on the curve.

Key idea

Recall and latency trade against each other along one curve, so tune to a target budget and always report both numbers together.

The Recall vs Latency Tradeoff

One curve, many knobs

Knobs that push the balance

How to choose a point

A common mistake

Key idea

Check yourself