Controlling randomness
When a model generates text it predicts a probability over next tokens. Sampling parameters decide how that distribution becomes a choice. The three common knobs are temperature, top p, and top k.
What each knob does
- Temperature scales the distribution. Low values sharpen it toward the top token, giving focused deterministic output. High values flatten it, adding variety and risk.
- Top k keeps only the k most likely tokens and samples among them, cutting off the long tail.
- Top p keeps the smallest set of tokens whose probabilities sum to p, a nucleus that adapts to how confident the model is.
Choosing settings
- For factual or structured tasks, use low temperature for consistency.
- For creative tasks, raise temperature or top p for diversity.
- Avoid stacking aggressive limits from several knobs at once, since they interact.
A practical note
These knobs change variety, not correctness. A confident model at low temperature can still be wrong, and high temperature does not add knowledge. Tune them to the task rather than chasing a single best setting.
Key idea
Temperature, top p, and top k shape how the next token distribution is sampled, trading focus for variety, but they tune diversity rather than correctness.