The Two Tower Retrieval Deep

Why two towers

Scoring every item with a deep network for every request is far too slow at web scale. The two tower model splits the work into a user tower and an item tower that produce embeddings independently, so item vectors can be precomputed and searched fast.

The structure

The user tower encodes the user and context into a vector.
The item tower encodes item features into a vector in the same space.
Relevance is the dot product or cosine of the two vectors.

Because the towers never mix until the final dot product, all item vectors live in an index built ahead of time.

Retrieval at serving time

Embed the user once per request.
Run approximate nearest neighbor search over the item index.
Return the top few hundred candidates in milliseconds.

Training tricks

Use in batch negatives, treating other items in the batch as negatives, which is cheap.
Apply a logQ correction to offset popular items that appear as negatives too often.

Key idea

Two tower models encode users and items into a shared space independently so item vectors can be indexed and retrieved by fast nearest neighbor search, trained efficiently with in batch negatives.

The Two Tower Retrieval Deep

Why two towers

The structure

Retrieval at serving time

Training tricks

Key idea

Check yourself