← Lessons

quiz vs the machine

Gold1470

Machine Learning

The Two Tower Model

Separate user and item encoders that meet only at a dot product, built for fast retrieval.

5 min read · core · beat Gold to climb

The architecture

The two tower model has a user tower and an item tower, each a neural network. The user tower turns user features into an embedding; the item tower turns item features into an embedding in the same space. The score is the dot product of the two embeddings.

Why the towers stay separate

Because the towers never share layers, item embeddings depend only on item features. That means you can precompute every item embedding once and store them in an index. At serving time you only compute the user embedding, then do a fast nearest neighbor search.

Training

  • Build batches of positive user item pairs from interactions.
  • Use in batch negatives: other items in the batch act as negatives, which is cheap and effective.
  • Optimize a contrastive or softmax loss so the true item scores higher than negatives.

Where it fits

The two tower model is the workhorse of candidate generation. Its separability is exactly what makes billion scale retrieval possible. It cannot model rich user item cross features, so a heavier ranker handles those later.

Key idea

The two tower model encodes users and items separately into one space and scores by dot product, so item embeddings precompute for fast nearest neighbor retrieval.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can item embeddings be precomputed in a two tower model?

2. What are in batch negatives?

3. Where does the two tower model fit in the pipeline?