Recall is the ceiling
If the passage that holds the answer is never retrieved, no reranker or generator can recover it. Recall, the fraction of needed passages that make it into the candidate set, is therefore the ceiling on everything downstream. Tuning recall means making sure the right passage is in the pool before any reranking begins.
The main knobs
- Top k. Fetching more candidates raises recall but also pulls in noise and costs reranking time. Push k up until recall plateaus.
- Hybrid search. Combining keyword and vector retrieval catches both exact term matches and semantic matches, lifting recall over either alone.
- Chunking and overlap. If the answer is split across chunks, no k saves it, so recall tuning loops back to how documents were divided.
- Embedding choice. A model better matched to your domain places relevant passages nearer the query.
The recall precision split of labor
A wise pipeline tunes the retriever for recall and lets a reranker handle precision. Fetch generously so the answer is present, then let the cross encoder push it to the top. Trying to make retrieval both high recall and perfectly ordered usually sacrifices recall.
Why it matters
Recall failures are silent. The system returns a confident answer built on the wrong context, so you must measure recall directly with labeled questions, not infer it from answer quality alone.
Key idea
Recall is the ceiling on RAG quality, so tune the retriever for generous recall through top k, hybrid search, and chunking, then let a reranker supply precision.