The Embedding Based Retrieval

Vectors that capture taste

Embedding based retrieval maps users and items into the same vector space so that a user vector lands near the items they will like. Retrieval then becomes a nearest neighbor search: find the item vectors closest to the user vector.

The two tower model

A user tower encodes user history and context into a vector.
An item tower encodes item features into a vector in the same space.
Training pulls vectors of engaged user item pairs together and pushes random pairs apart.

Why two towers scale

Item vectors can be precomputed offline and indexed. At request time only the user vector is computed, then an approximate nearest neighbor index returns close items in milliseconds, even over millions of candidates.

Negative sampling

Training needs negatives. Random catalog items are easy negatives; hard negatives, items similar to positives but not engaged, sharpen the boundary. The mix of negatives strongly shapes what the towers learn.

Strengths and limits

Embeddings generalize to items with little direct history through their features, but they compress information, so they retrieve a strong pool rather than a perfectly ordered one. Ranking refines from there.

Key idea

Embedding retrieval places users and items in one vector space so nearest neighbor search surfaces candidates fast, with precomputed item vectors making it scale to millions of items.