Retrieve wide, then sharpen
First stage retrieval favors speed, so it casts a wide net and may rank loosely. A reranker is a second, slower model that looks closely at a small shortlist and reorders it for precision. The pattern is retrieve many, rerank a few.
Why a separate model
The retriever encodes query and document separately, then compares vectors, which is fast but coarse. A reranker is usually a cross encoder that reads the query and a document together, letting it judge fine grained relevance the bi encoder cannot.
The cost
A cross encoder must run once per query document pair, so it is far too slow to score the whole database. That is exactly why it sits after retrieval, scoring only the shortlist of perhaps the top fifty candidates.
Practical payoff
- Higher precision at the very top, where it matters most for answers.
- Better grounding for downstream generation, since the best passages rise first.
Key idea
A reranker is a slow cross encoder that reads query and document together to reorder a retrieved shortlist, buying precision at the top by running only on a few candidates.