The Cost and Latency of Agent Loops
A single model call is one cost. An agent loop makes many calls, each carrying the growing context, so cost and latency add up fast across a multi step task.
Where the cost comes from
- Every loop step reprocesses the entire context, which grows as turns accumulate.
- Tool calls add their own network latency between model steps.
- Retries and replanning multiply the number of round trips.
Why it compounds
If each step processes more tokens than the last, a ten step task can cost far more than ten times a single step. Latency stacks the same way: the user waits for model time plus tool time plus the next model time, step after step. Long agent runs can take many seconds or even minutes.
Taming the budget
- Cap steps so a stuck agent cannot run forever.
- Trim context by summarizing old turns instead of carrying every token.
- Cache repeated prefixes and tool results to avoid redundant work.
- Use a smaller model for routine steps and reserve a strong one for hard decisions.
Key idea
Agent loops reprocess a growing context every step, so cost and latency compound and must be tamed with caps, trimming, and caching.