← Lessons

quiz vs the machine

Silver1120

Machine Learning

Temperature and Sampling

Controlling how random or focused a model's next token choice is.

4 min read · intro · beat Silver to climb

Choosing the next token

A language model outputs a probability for every token in its vocabulary. Sampling is the act of picking one token from that distribution to continue the text.

The temperature knob

Temperature rescales the probabilities before sampling.

  • Low temperature sharpens the distribution toward the most likely tokens, giving safe and repetitive text
  • High temperature flattens it, giving more surprising and creative text
  • A temperature near zero is almost deterministic and usually picks the top token

Trimming the tail

Pure sampling can occasionally pick very unlikely tokens, producing nonsense. Two common filters help.

  • Top k sampling keeps only the k most likely tokens, then samples among them
  • Top p sampling, also called nucleus sampling, keeps the smallest set of tokens whose probabilities add up to p

These filters remove the long tail of bad options while still allowing variety.

Key idea

Temperature scales randomness while top k and top p trim unlikely tokens, together tuning the balance of creativity and coherence.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a low temperature do to generation?

2. What does top p sampling keep?