Choosing the next token
A language model outputs a probability for every token in its vocabulary. Sampling is the act of picking one token from that distribution to continue the text.
The temperature knob
Temperature rescales the probabilities before sampling.
- Low temperature sharpens the distribution toward the most likely tokens, giving safe and repetitive text
- High temperature flattens it, giving more surprising and creative text
- A temperature near zero is almost deterministic and usually picks the top token
Trimming the tail
Pure sampling can occasionally pick very unlikely tokens, producing nonsense. Two common filters help.
- Top k sampling keeps only the k most likely tokens, then samples among them
- Top p sampling, also called nucleus sampling, keeps the smallest set of tokens whose probabilities add up to p
These filters remove the long tail of bad options while still allowing variety.
Key idea
Temperature scales randomness while top k and top p trim unlikely tokens, together tuning the balance of creativity and coherence.