The Cost and Latency of Agent Loops

A single model call is one cost. An agent loop makes many calls, each carrying the growing context, so cost and latency add up fast across a multi step task.

Where the cost comes from

Every loop step reprocesses the entire context, which grows as turns accumulate.
Tool calls add their own network latency between model steps.
Retries and replanning multiply the number of round trips.

Why it compounds

If each step processes more tokens than the last, a ten step task can cost far more than ten times a single step. Latency stacks the same way: the user waits for model time plus tool time plus the next model time, step after step. Long agent runs can take many seconds or even minutes.

Taming the budget

Cap steps so a stuck agent cannot run forever.
Trim context by summarizing old turns instead of carrying every token.
Cache repeated prefixes and tool results to avoid redundant work.
Use a smaller model for routine steps and reserve a strong one for hard decisions.

Key idea

Agent loops reprocess a growing context every step, so cost and latency compound and must be tamed with caps, trimming, and caching.

The Cost and Latency of Agent Loops