Retrieval Augmented Generation

The problem it solves

A language model only knows what was in its training data, which can be stale or missing private facts. Retrieval augmented generation, called RAG, fixes this by fetching relevant documents and feeding them to the model as context.

The pipeline

Documents are split into chunks and turned into embeddings
The chunks are stored in a vector database
At query time the question is embedded and used to find the closest chunks
The retrieved chunks are placed in the prompt alongside the question
The model generates an answer grounded in that context

Why it helps

RAG lets a model cite fresh or proprietary information without retraining. It reduces hallucination because the answer is anchored to real passages. The quality depends heavily on retrieval. If the wrong chunks come back, the answer suffers. Good chunking, ranking, and prompt design all matter.

Key idea

RAG retrieves relevant chunks from a vector store and adds them to the prompt, grounding generation in fresh and specific information.

Retrieval Augmented Generation

The problem it solves

The pipeline

Why it helps

Key idea

Check yourself