Context Window Management
Every model has a maximum context window, a fixed token budget for everything it can see at once. In a long agent run that budget fills, and what stays in it directly shapes the model's behavior.
What competes for space
- The system prompt and tool schemas, which are usually fixed overhead.
- The growing history of messages, thoughts, and tool observations.
- Retrieved documents or memories pulled in for the current step.
Strategies when it fills
- Summarize older turns into a compact recap that preserves key facts.
- Drop irrelevant or stale messages entirely.
- Retrieve on demand so large references live outside context and enter only when needed.
The lost in the middle effect
Models attend unevenly across a long context, often weighting the start and end more than the middle. So placement matters: critical instructions near the edges, bulky background trimmed or retrieved. Good context management is not just fitting under the limit but arranging what remains so the model actually uses the parts that matter.
Key idea
Context management keeps the prompt under its token limit by summarizing, dropping, and retrieving, and arranges what remains so the model uses it.