← Lessons

quiz vs the machine

Gold1410

Machine Learning

The Cost and Latency of Agent Loops

Why looping agents are slow and expensive, and how to tame it.

5 min read · core · beat Gold to climb

The Cost and Latency of Agent Loops

A single model call is one cost. An agent loop makes many calls, each carrying the growing context, so cost and latency add up fast across a multi step task.

Where the cost comes from

  • Every loop step reprocesses the entire context, which grows as turns accumulate.
  • Tool calls add their own network latency between model steps.
  • Retries and replanning multiply the number of round trips.

Why it compounds

If each step processes more tokens than the last, a ten step task can cost far more than ten times a single step. Latency stacks the same way: the user waits for model time plus tool time plus the next model time, step after step. Long agent runs can take many seconds or even minutes.

Taming the budget

  • Cap steps so a stuck agent cannot run forever.
  • Trim context by summarizing old turns instead of carrying every token.
  • Cache repeated prefixes and tool results to avoid redundant work.
  • Use a smaller model for routine steps and reserve a strong one for hard decisions.

Key idea

Agent loops reprocess a growing context every step, so cost and latency compound and must be tamed with caps, trimming, and caching.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does an agent loop cost more than a single call?

2. Which technique reduces per step token cost?

3. How can a smaller model help the budget?