← Lessons

quiz vs the machine

Gold1440

Machine Learning

The Candidate Generation Deep

Narrowing millions of items to a few hundred before ranking.

4 min read · core · beat Gold to climb

Why a funnel exists

A catalog may hold millions of items, but you cannot score each one with a heavy ranking model per request. Recommenders use a funnel: a fast candidate generation stage narrows the catalog to a few hundred, then a precise ranker orders those.

Generation must be fast and broad

  • It trades precision for recall, aiming to include almost every relevant item.
  • It must run in a few milliseconds over the whole catalog.

Common sources

  • Two tower retrieval with nearest neighbor search over item embeddings.
  • Co occurrence lists, like users who liked this also liked that.
  • Popularity and trending items for freshness.
  • Personalized rules such as recent categories.

Most systems blend several sources and deduplicate the union into one candidate set.

Measuring it

  • Judge generation by recall at K, the share of relevant items that made the set.
  • A candidate set with poor recall caps the whole system, since the ranker can only reorder what it receives.

Key idea

Candidate generation is the fast recall focused first stage that shrinks millions of items to a few hundred, blending retrieval sources whose recall sets a ceiling on the entire recommender.

Check yourself

Answer to earn rating on the learn ladder.

1. What does candidate generation prioritize over precision?

2. Why does candidate set recall cap the whole system?