Context Length and Tokens

The window is a token budget

A model's context length is the maximum number of tokens it can attend to at once. It covers the prompt and the generated reply together, so they share one budget.

What competes for space

The system message and instructions.
Conversation history in a chat.
Retrieved documents in a retrieval pipeline.
The model's own growing output.

When the total would exceed the limit, something must be dropped, summarized, or truncated.

Why tokens not characters

Because the model operates on tokens, the limit is naturally a token count. The same character count can fit very differently depending on language and tokenizer fertility.

Practical pressure

Longer contexts also cost more and can slow attention, so filling the window is rarely free even when it fits. Good systems budget context deliberately rather than dumping everything in.

Key idea

Context length is a shared token budget for prompt plus output, and everything from history to retrieved text competes for that limited space.

Context Length and Tokens

The window is a token budget

What competes for space

Why tokens not characters

Practical pressure

Key idea

Check yourself