← Lessons

quiz vs the machine

Gold1470

Machine Learning

The Two Tower Retrieval Deep

Encoding users and items separately for fast nearest neighbor recall.

5 min read · core · beat Gold to climb

Why two towers

Scoring every item with a deep network for every request is far too slow at web scale. The two tower model splits the work into a user tower and an item tower that produce embeddings independently, so item vectors can be precomputed and searched fast.

The structure

  • The user tower encodes the user and context into a vector.
  • The item tower encodes item features into a vector in the same space.
  • Relevance is the dot product or cosine of the two vectors.

Because the towers never mix until the final dot product, all item vectors live in an index built ahead of time.

Retrieval at serving time

  • Embed the user once per request.
  • Run approximate nearest neighbor search over the item index.
  • Return the top few hundred candidates in milliseconds.

Training tricks

  • Use in batch negatives, treating other items in the batch as negatives, which is cheap.
  • Apply a logQ correction to offset popular items that appear as negatives too often.

Key idea

Two tower models encode users and items into a shared space independently so item vectors can be indexed and retrieved by fast nearest neighbor search, trained efficiently with in batch negatives.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can two tower item vectors be precomputed?

2. What makes two tower retrieval fast at serving time?

3. What is the purpose of the logQ correction?