← Lessons

quiz vs the machine

Gold1360

Machine Learning

Token Cost and Pricing

Why you pay per token, and how to estimate and control that cost.

4 min read · core · beat Gold to climb

Tokens are the billing unit

API pricing is almost always per token, counted separately for input and output. The model never charges per word or per character, so understanding tokens is understanding your bill.

Input versus output

  • Input tokens cover your prompt, system message, and any context you send.
  • Output tokens cover what the model generates.
  • Output is usually priced higher per token than input.

Rules of thumb

For English a token is often around three to four characters, so a token is loosely three quarters of a word. Other languages and code can be much denser, so always measure rather than assume.

Controlling spend

  • Trim redundant context and long system prompts.
  • Cap output length when you can.
  • Cache reusable prefixes if the provider supports it.
  • Measure with the real tokenizer, since estimates drift across models.

Key idea

You are billed per token with input and output counted separately, so measuring and trimming tokens directly controls cost.

Check yourself

Answer to earn rating on the learn ladder.

1. How is language model API usage typically billed?

2. Roughly how much text is one English token?