The Two Tower Model

Separate user and item encoders that meet only at a dot product, built for fast retrieval.

The architecture

The two tower model has a user tower and an item tower, each a neural network. The user tower turns user features into an embedding; the item tower turns item features into an embedding in the same space. The score is the dot product of the two embeddings.

Why the towers stay separate

Because the towers never share layers, item embeddings depend only on item features. That means you can precompute every item embedding once and store them in an index. At serving time you only compute the user embedding, then do a fast nearest neighbor search.

Training

Build batches of positive user item pairs from interactions.
Use in batch negatives: other items in the batch act as negatives, which is cheap and effective.
Optimize a contrastive or softmax loss so the true item scores higher than negatives.

Where it fits

The two tower model is the workhorse of candidate generation. Its separability is exactly what makes billion scale retrieval possible. It cannot model rich user item cross features, so a heavier ranker handles those later.

Key idea

The two tower model encodes users and items separately into one space and scores by dot product, so item embeddings precompute for fast nearest neighbor retrieval.

The Two Tower Model

The architecture

Why the towers stay separate

Training

Where it fits

Key idea

Check yourself