← Lessons

quiz vs the machine

Gold1340

Machine Learning

The Recall vs Latency Tradeoff

Every ANN knob pushes you along the same curve between quality and speed.

4 min read · core · beat Gold to climb

One curve, many knobs

Vector search lives on a tradeoff between recall, how many true neighbors you find, and latency, how long each query takes. Almost every tuning knob moves you along this curve rather than escaping it.

Knobs that push the balance

  • nprobe in IVF: more buckets searched means higher recall and higher latency.
  • ef search in HNSW: a larger candidate list improves recall but costs time.
  • Quantization level: stronger compression speeds search but can lower recall.

How to choose a point

Start from a target. If the product needs results in under fifty milliseconds, tune knobs until you hit the best recall within that budget. If recall must exceed ninety five percent, accept the latency that requires.

A common mistake

Reporting recall without latency, or latency without recall, hides the tradeoff. Always state both, since a method is only better if it beats another at the same point on the curve.

Key idea

Recall and latency trade against each other along one curve, so tune to a target budget and always report both numbers together.

Check yourself

Answer to earn rating on the learn ladder.

1. What happens when you raise search effort in an ANN index?

2. Why should recall and latency always be reported together?