← Lessons

quiz vs the machine

Gold1460

Machine Learning

Retrieval Augmented Generation

Grounding a model's answers in documents fetched at query time.

5 min read · core · beat Gold to climb

The problem it solves

A language model only knows what was in its training data, which can be stale or missing private facts. Retrieval augmented generation, called RAG, fixes this by fetching relevant documents and feeding them to the model as context.

The pipeline

  • Documents are split into chunks and turned into embeddings
  • The chunks are stored in a vector database
  • At query time the question is embedded and used to find the closest chunks
  • The retrieved chunks are placed in the prompt alongside the question
  • The model generates an answer grounded in that context

Why it helps

RAG lets a model cite fresh or proprietary information without retraining. It reduces hallucination because the answer is anchored to real passages. The quality depends heavily on retrieval. If the wrong chunks come back, the answer suffers. Good chunking, ranking, and prompt design all matter.

Key idea

RAG retrieves relevant chunks from a vector store and adds them to the prompt, grounding generation in fresh and specific information.

Check yourself

Answer to earn rating on the learn ladder.

1. What problem does RAG primarily address?

2. What is stored in the vector database for RAG?

3. Why can poor retrieval hurt a RAG answer?