← Lessons

quiz vs the machine

Platinum1860

Machine Learning

Retrieval Augmented Generation Pipeline

Grounding a language model in retrieved documents.

6 min read · advanced · beat Platinum to climb

Retrieval Augmented Generation Pipeline

A language model knows only what was in its training data, and it can confidently invent facts, a failure called hallucination. Retrieval augmented generation, or RAG, fixes both by fetching relevant documents at question time and feeding them to the model as context.

The pipeline runs in clear stages:

  • Chunk and index, splitting source documents into passages, embedding each, and storing them in a vector database
  • Retrieve, embedding the user question and pulling the most similar passages with semantic search
  • Augment, inserting those passages into the prompt alongside the question
  • Generate, where the model answers using the supplied context

Because the answer is grounded in retrieved text, RAG sharply reduces hallucination and can cite its sources. It also lets you update knowledge by changing the document store, with no retraining of the model.

Design choices shape quality. Chunk size trades context against precision, since chunks too large dilute relevance and chunks too small lose meaning. The number of retrieved passages affects both coverage and prompt length.

The dominant failure mode is retrieval. If the right passage is not fetched, the model cannot use it, so the answer suffers. This makes a strong retriever and good chunking the heart of a reliable RAG system.

Key idea

RAG retrieves relevant passages and feeds them as context so a language model answers from grounded text, reducing hallucination without retraining.

Check yourself

Answer to earn rating on the learn ladder.

1. What problem does RAG primarily reduce?

2. How can you update a RAG system's knowledge?

3. What is the dominant failure mode in RAG?