The scale problem
A catalog may hold millions of items, but a feed shows only a handful. Scoring every item with an expensive model for every request is impossible. Recommenders solve this with a funnel: a sequence of stages, each cheaper and broader than the next is precise.
The classic stages
- Retrieval quickly pulls a few hundred or thousand candidates from the full catalog using cheap signals.
- Ranking scores that smaller set with a heavy model to estimate how much the user will like each item.
- Re ranking adjusts the final order for diversity, business rules, and freshness before display.
Why staged beats one shot
- Each stage trades recall for precision. Retrieval favors recall, so good items are not lost early.
- Ranking favors precision, spending compute only on survivors.
- Latency stays bounded because the heavy model never sees the full catalog.
The cost of mistakes
An item dropped in retrieval can never be recommended, no matter how good ranking is. So retrieval is tuned to be generous, and later stages do the fine sorting.
Key idea
Recommenders use a funnel of retrieval, ranking, and re ranking so that cheap stages favor recall and expensive stages favor precision under a latency budget.