One curve, many knobs
Vector search lives on a tradeoff between recall, how many true neighbors you find, and latency, how long each query takes. Almost every tuning knob moves you along this curve rather than escaping it.
Knobs that push the balance
- nprobe in IVF: more buckets searched means higher recall and higher latency.
- ef search in HNSW: a larger candidate list improves recall but costs time.
- Quantization level: stronger compression speeds search but can lower recall.
How to choose a point
Start from a target. If the product needs results in under fifty milliseconds, tune knobs until you hit the best recall within that budget. If recall must exceed ninety five percent, accept the latency that requires.
A common mistake
Reporting recall without latency, or latency without recall, hides the tradeoff. Always state both, since a method is only better if it beats another at the same point on the curve.
Key idea
Recall and latency trade against each other along one curve, so tune to a target budget and always report both numbers together.