← Lessons

quiz vs the machine

Gold1370

Machine Learning

Context Length and Tokens

Why the context window is measured in tokens and what fills it up.

4 min read · core · beat Gold to climb

The window is a token budget

A model's context length is the maximum number of tokens it can attend to at once. It covers the prompt and the generated reply together, so they share one budget.

What competes for space

  • The system message and instructions.
  • Conversation history in a chat.
  • Retrieved documents in a retrieval pipeline.
  • The model's own growing output.

When the total would exceed the limit, something must be dropped, summarized, or truncated.

Why tokens not characters

Because the model operates on tokens, the limit is naturally a token count. The same character count can fit very differently depending on language and tokenizer fertility.

Practical pressure

Longer contexts also cost more and can slow attention, so filling the window is rarely free even when it fits. Good systems budget context deliberately rather than dumping everything in.

Key idea

Context length is a shared token budget for prompt plus output, and everything from history to retrieved text competes for that limited space.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the context window measure?

2. What happens when content would exceed the context limit?