Requirements
- Return ranked suggestions on each keystroke in tens of milliseconds.
- Rank by popularity and freshness.
- Scale to a huge query vocabulary.
High level design
Typing fires a request per prefix, served from a precomputed structure rather than a live scan.
- Trie: a prefix tree stores terms, and each node caches its top completions so a lookup is a single descent.
- Aggregation pipeline: a background job counts query frequencies from logs and rebuilds the trie periodically.
- Serving cache: the precomputed top suggestions per prefix sit in memory for instant reads.
Bottlenecks
- Latency per keystroke: precompute the top completions at each node so serving avoids ranking at query time.
- Update freshness: rebuilding the whole trie is costly, so update in batches and merge popular new terms quickly.
- Scale: shard the trie by prefix across nodes when the vocabulary is too large for one machine.
Debounce on the client and cap suggestions to a handful so the response stays tiny.
Key idea
Autocomplete precomputes ranked completions into a trie so each keystroke is a fast prefix lookup, with a background pipeline keeping rankings fresh.