← Lessons

quiz vs the machine

Gold1450

Machine Learning

Context Window Management

Deciding what to keep, drop, or summarize as the prompt grows.

5 min read · core · beat Gold to climb

Context Window Management

Every model has a maximum context window, a fixed token budget for everything it can see at once. In a long agent run that budget fills, and what stays in it directly shapes the model's behavior.

What competes for space

  • The system prompt and tool schemas, which are usually fixed overhead.
  • The growing history of messages, thoughts, and tool observations.
  • Retrieved documents or memories pulled in for the current step.

Strategies when it fills

  • Summarize older turns into a compact recap that preserves key facts.
  • Drop irrelevant or stale messages entirely.
  • Retrieve on demand so large references live outside context and enter only when needed.

The lost in the middle effect

Models attend unevenly across a long context, often weighting the start and end more than the middle. So placement matters: critical instructions near the edges, bulky background trimmed or retrieved. Good context management is not just fitting under the limit but arranging what remains so the model actually uses the parts that matter.

Key idea

Context management keeps the prompt under its token limit by summarizing, dropping, and retrieving, and arranges what remains so the model uses it.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the context window?

2. What is the lost in the middle effect?

3. Which strategy keeps large references out of context until needed?