← Lessons

quiz vs the machine

Gold1430

System Design

Search Latency Optimization

Techniques that keep query response fast, including tail latency control.

5 min read · core · beat Gold to climb

Why latency is hard

A query touches many shards and stages, and the user waits for the slowest part. Median latency can look fine while the tail, the slow few percent, ruins experience.

Core techniques

  • Caching stores results for popular queries so they skip retrieval entirely.
  • Early termination stops scanning a posting list once enough strong candidates are found.
  • Tiered indexes put high quality documents in a small fast tier searched first.

Taming the tail

Scatter and gather waits for every shard, so one slow shard slows the whole query. Hedged requests send a duplicate to another replica if the first is slow and take whichever returns first. This trades a little extra load for a much tighter tail.

Measure the right thing

Track high percentiles, not just the average. A system tuned only for the mean can still feel slow because users remember the worst responses.

Diagram

Key idea

Latency work mixes caching, early termination, and tiered indexes, plus hedged requests to control the tail that users feel most.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a hedged request do?

2. Why track high percentiles instead of the average latency?

3. What does early termination do?