← Lessons

quiz vs the machine

Platinum1850

Machine Learning

The RAG Pipeline End to End

How chunking, retrieval, reranking, and generation connect into one system.

6 min read · advanced · beat Platinum to climb

Grounding generation in retrieval

Retrieval augmented generation, or RAG, answers a question by first fetching relevant passages and then asking a language model to answer using them. It grounds the model in real, current, source backed text instead of relying only on its trained weights.

The full flow

  • Index time: documents are chunked, each chunk is embedded, and vectors plus metadata land in a vector store.
  • Query time: the question is embedded, candidates are retrieved, optionally reranked, then passed as context to the generator.

Why each stage matters

  • Chunking sets what can be retrieved at all.
  • Retrieval decides which passages enter the context.
  • Reranking sharpens the order so the best passages lead.
  • Generation writes the answer and should cite its sources.

Where it goes wrong

  • Missing context: retrieval fails, so the model guesses or hallucinates.
  • Distracting context: irrelevant passages crowd out the right one.
  • Ignored context: the model has the answer but does not use it.

Good RAG tunes every stage and evaluates retrieval and generation separately, since a weak link anywhere caps the whole system.

Key idea

RAG chunks and indexes documents, then at query time retrieves, reranks, and feeds passages to a generator that answers from them, so the system is only as strong as its weakest stage.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the core promise of RAG?

2. What does the missing context failure mode mean?

3. Why evaluate retrieval and generation separately?