Beyond a single fetch
Classic RAG retrieves once and answers. That fails on questions needing several lookups, comparisons, or a decision about whether to search at all. Agentic RAG puts a language model in control of retrieval, letting it plan, search repeatedly, and judge its own progress before answering.
What the agent controls
- Whether to retrieve. For a simple question the agent may answer directly and skip search, saving latency.
- What to retrieve. It can rewrite the query, pick among several tools or indexes, and target a sub question.
- When to stop. After reading results it decides whether it has enough or must search again with a refined query.
This becomes a loop of reason, retrieve, observe, repeat, often framed as a tool using agent where each retriever is a tool.
Multi step questions
A query like compare the revenue policies of two regions needs two separate lookups and a synthesis. An agent decomposes it, retrieves for each part, then combines, something single shot RAG cannot do.
Costs and risks
Each loop adds latency and token cost, and a confused agent can spiral into needless searches. Guardrails like a step limit and a self check on whether new context helped keep the loop bounded.
Why it matters
Agentic RAG trades simplicity for the ability to handle open ended, multi hop questions, adapting its retrieval to what each question actually demands.
Key idea
Agentic RAG lets a model decide whether, what, and when to retrieve across multiple steps, handling multi hop questions that single shot retrieval cannot, at the cost of extra latency and the need for guardrails.