← Lessons

quiz vs the machine

Silver1080

Machine Learning

The Candidate Retrieval Stage

How recommenders fetch a good shortlist from a giant catalog fast.

4 min read · intro · beat Silver to climb

What retrieval does

Retrieval turns a request into a few hundred candidate items pulled from the entire catalog in milliseconds. It cannot afford a heavy per item model, so it leans on precomputed structures and cheap lookups.

Common retrieval sources

  • Co occurrence lists, like users who watched this also watched that.
  • Embedding nearest neighbors, where a user vector finds nearby item vectors.
  • Popularity and trending items as a strong fallback.
  • Rule based sources such as recent searches or followed creators.

Blending many sources

Real systems run several retrieval sources in parallel and union their results. Each source covers a different intent, so blending raises overall recall. Duplicates are merged before ranking sees the pool.

The recall target

Retrieval is judged by recall at k: of the items the user would actually engage with, how many appear in the candidate set. A high recall pool gives ranking room to shine; a thin pool caps the whole system.

Key idea

Candidate retrieval blends several cheap sources to build a high recall shortlist quickly, because ranking can only reorder what retrieval supplies.

Check yourself

Answer to earn rating on the learn ladder.

1. How is the candidate retrieval stage primarily judged?

2. Why do production systems blend multiple retrieval sources?