The problem it solves
A language model only knows what was in its training data, which can be stale or missing private facts. Retrieval augmented generation, called RAG, fixes this by fetching relevant documents and feeding them to the model as context.
The pipeline
- Documents are split into chunks and turned into embeddings
- The chunks are stored in a vector database
- At query time the question is embedded and used to find the closest chunks
- The retrieved chunks are placed in the prompt alongside the question
- The model generates an answer grounded in that context
Why it helps
RAG lets a model cite fresh or proprietary information without retraining. It reduces hallucination because the answer is anchored to real passages. The quality depends heavily on retrieval. If the wrong chunks come back, the answer suffers. Good chunking, ranking, and prompt design all matter.
Key idea
RAG retrieves relevant chunks from a vector store and adds them to the prompt, grounding generation in fresh and specific information.