← Lessons

quiz vs the machine

Gold1400

Machine Learning

The Context Window Budgeting

Spending a finite token budget on what matters most.

5 min read · core · beat Gold to climb

A finite budget

A model can only attend to a limited number of tokens at once, called the context window. Everything you send competes for it, including the system prompt, instructions, examples, retrieved documents, the user message, and the space left for the answer.

What fills the window

  • Instructions and persona from the system prompt.
  • Examples if you use few shot prompting.
  • Retrieved context pulled in for grounding.
  • Conversation history from earlier turns.
  • Reserved output room for the model reply.

Budgeting strategies

  • Trim history by summarizing old turns instead of resending them.
  • Rank retrieval so only the most relevant passages are included.
  • Compress examples to the fewest that still teach the pattern.
  • Reserve output tokens so a long answer is not truncated.

Why it matters

Overflowing the window forces truncation, which can silently drop important context and degrade answers. Models can also lose focus on content buried in the middle of a very long prompt, so placement and brevity both help.

Key idea

The context window is a finite token budget shared by every part of the prompt, so trimming history, ranking retrieval, and reserving output room keep the most useful content in view.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the context window?

2. A good budgeting strategy is to