Temperature and Sampling

Choosing the next token

A language model outputs a probability for every token in its vocabulary. Sampling is the act of picking one token from that distribution to continue the text.

The temperature knob

Temperature rescales the probabilities before sampling.

Low temperature sharpens the distribution toward the most likely tokens, giving safe and repetitive text
High temperature flattens it, giving more surprising and creative text
A temperature near zero is almost deterministic and usually picks the top token

Trimming the tail

Pure sampling can occasionally pick very unlikely tokens, producing nonsense. Two common filters help.

Top k sampling keeps only the k most likely tokens, then samples among them
Top p sampling, also called nucleus sampling, keeps the smallest set of tokens whose probabilities add up to p

These filters remove the long tail of bad options while still allowing variety.

Key idea

Temperature scales randomness while top k and top p trim unlikely tokens, together tuning the balance of creativity and coherence.

Temperature and Sampling

Choosing the next token

The temperature knob

Trimming the tail

Key idea

Check yourself