The Reranker Stage

Retrieve wide, then sharpen

First stage retrieval favors speed, so it casts a wide net and may rank loosely. A reranker is a second, slower model that looks closely at a small shortlist and reorders it for precision. The pattern is retrieve many, rerank a few.

Why a separate model

The retriever encodes query and document separately, then compares vectors, which is fast but coarse. A reranker is usually a cross encoder that reads the query and a document together, letting it judge fine grained relevance the bi encoder cannot.

The cost

A cross encoder must run once per query document pair, so it is far too slow to score the whole database. That is exactly why it sits after retrieval, scoring only the shortlist of perhaps the top fifty candidates.

Practical payoff

Higher precision at the very top, where it matters most for answers.
Better grounding for downstream generation, since the best passages rise first.

Key idea

A reranker is a slow cross encoder that reads query and document together to reorder a retrieved shortlist, buying precision at the top by running only on a few candidates.

The Reranker Stage

Retrieve wide, then sharpen

Why a separate model

The cost

Practical payoff

Key idea

Check yourself