quiz vs the machine

Platinum1800

System Design

Design YouTube Video Recommendations

Generate personalized video suggestions with candidate generation and ranking.

7 min read · advanced · beat Platinum to climb

Requirements

Suggest videos a user is likely to watch next.
Serve recommendations with low latency from a catalog of billions.
Balance relevance with freshness and diversity.

High level design

Recommendations use a two stage funnel of candidate generation then ranking.

Candidate generation: cheap models retrieve a few hundred candidates from billions using embeddings and approximate nearest neighbor search.
Ranking: a heavier model scores each candidate using rich features like watch history, recency, and engagement signals.
Serving: precompute candidate sets offline where possible and rank at request time.

Bottlenecks

Catalog size: scanning billions per request is impossible, so the funnel narrows first with cheap retrieval.
Feature freshness: a feature store serves up to date signals to the ranker.
Feedback loops: popular videos get more exposure, so inject exploration and diversity.

Tradeoffs

More candidates improve recall but raise ranking cost.
Heavier ranking models improve quality but add latency.

Key idea

Recommendations are a two stage funnel where cheap retrieval narrows billions to hundreds and a heavy ranker orders the survivors with rich features.

Check yourself

Answer to earn rating on the learn ladder.

1. Why use a two stage candidate generation then ranking funnel?

2. Why inject exploration and diversity into recommendations?