The Cross Encoder Versus Bi Encoder

Two ways to compare

A bi encoder embeds each text independently into a vector, then compares vectors with a cheap similarity. A cross encoder feeds both texts together through one transformer and outputs a single relevance score.

The tradeoff

The bi encoder is fast and scalable: you embed a corpus once and reuse the vectors for every query, enabling search over millions of items.
The cross encoder is more accurate because attention runs across both texts at once, capturing fine interactions, but it must run fresh for every pair and cannot precompute.

Why not always use the cross encoder

Scoring a query against a million documents with a cross encoder means a million expensive transformer passes per query, which is far too slow. The bi encoder precomputes document vectors so the query side is a single embedding plus fast nearest neighbor search.

The standard pattern

A common production design is retrieve then rerank:

The bi encoder quickly retrieves a few hundred candidates.
The cross encoder reranks just those candidates for final precision.

This combines the speed of one with the accuracy of the other.

Key idea