Tokens are the billing unit
API pricing is almost always per token, counted separately for input and output. The model never charges per word or per character, so understanding tokens is understanding your bill.
Input versus output
- Input tokens cover your prompt, system message, and any context you send.
- Output tokens cover what the model generates.
- Output is usually priced higher per token than input.
Rules of thumb
For English a token is often around three to four characters, so a token is loosely three quarters of a word. Other languages and code can be much denser, so always measure rather than assume.
Controlling spend
- Trim redundant context and long system prompts.
- Cap output length when you can.
- Cache reusable prefixes if the provider supports it.
- Measure with the real tokenizer, since estimates drift across models.
Key idea
You are billed per token with input and output counted separately, so measuring and trimming tokens directly controls cost.