The Candidate Retrieval Stage

What retrieval does

Retrieval turns a request into a few hundred candidate items pulled from the entire catalog in milliseconds. It cannot afford a heavy per item model, so it leans on precomputed structures and cheap lookups.

Common retrieval sources

Co occurrence lists, like users who watched this also watched that.
Embedding nearest neighbors, where a user vector finds nearby item vectors.
Popularity and trending items as a strong fallback.
Rule based sources such as recent searches or followed creators.

Blending many sources

Real systems run several retrieval sources in parallel and union their results. Each source covers a different intent, so blending raises overall recall. Duplicates are merged before ranking sees the pool.

The recall target

Retrieval is judged by recall at k: of the items the user would actually engage with, how many appear in the candidate set. A high recall pool gives ranking room to shine; a thin pool caps the whole system.

Key idea

Candidate retrieval blends several cheap sources to build a high recall shortlist quickly, because ranking can only reorder what retrieval supplies.

The Candidate Retrieval Stage

What retrieval does

Common retrieval sources

Blending many sources

The recall target

Key idea

Check yourself