Requirements
- Suggest videos a user is likely to watch next.
- Serve recommendations with low latency from a catalog of billions.
- Balance relevance with freshness and diversity.
High level design
Recommendations use a two stage funnel of candidate generation then ranking.
- Candidate generation: cheap models retrieve a few hundred candidates from billions using embeddings and approximate nearest neighbor search.
- Ranking: a heavier model scores each candidate using rich features like watch history, recency, and engagement signals.
- Serving: precompute candidate sets offline where possible and rank at request time.
Bottlenecks
- Catalog size: scanning billions per request is impossible, so the funnel narrows first with cheap retrieval.
- Feature freshness: a feature store serves up to date signals to the ranker.
- Feedback loops: popular videos get more exposure, so inject exploration and diversity.
Tradeoffs
- More candidates improve recall but raise ranking cost.
- Heavier ranking models improve quality but add latency.
Key idea
Recommendations are a two stage funnel where cheap retrieval narrows billions to hundreds and a heavy ranker orders the survivors with rich features.