← Lessons

quiz vs the machine

Gold1390

Machine Learning

The Cost Control In Agent Loops

How to bound token spend and latency when an agent runs many model calls.

5 min read · core · beat Gold to climb

Loops multiply cost

Every turn of an agent loop is one or more model calls, and the prompt grows as history accumulates. Without limits a single task can fire dozens of expensive calls, so cost control is a first class design concern.

Levers to pull

  • Step cap: hard limit on loop iterations so it cannot run forever.
  • Context trimming: summarize or drop old turns to keep the prompt small.
  • Model tiering: use a cheap model for routine steps and a strong one only for hard decisions.
  • Caching: reuse stable prefixes so repeated context is not reprocessed.
  • Early stop: end as soon as the answer is confident enough.

Measuring it

Track tokens and calls per task, not just per request. A loop that looks cheap per call can be costly across an hour of work. Set a budget per task and abort gracefully when it is exceeded.

The balance

Aggressive trimming can drop facts the agent needs, hurting quality. The goal is the cheapest path that still reaches a correct answer, found by measuring real tasks rather than guessing.

Key idea

Cost control bounds agent loops with step caps context trimming model tiering caching and early stop, measured by tokens per task rather than per call, seeking the cheapest path that still reaches a correct answer.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does an agent loop tend to be costly?

2. What does model tiering mean for cost control?