What it is
When a language model generates text it produces a probability for every possible next token. Top k and top p sampling are rules for trimming that list before randomly picking, balancing quality against variety.
How each works
Both methods restrict the candidate set, then sample from what remains.
- Top k keeps the k most likely tokens and discards the rest, so k of forty considers forty candidates
- Top p, also called nucleus sampling, keeps the smallest set of tokens whose probabilities add up to at least p, such as ninety five percent
The difference is that top k uses a fixed count while top p adapts. When the model is confident, top p keeps few tokens, and when it is uncertain, it keeps more.
Tuning behavior
These knobs work alongside temperature, which sharpens or flattens the distribution first.
- A small k or p makes output more focused and repetitive
- A large k or p makes output more varied but riskier
- They prevent the model from ever picking a very low probability token by accident
Key idea
Top k keeps a fixed number of likely tokens while top p keeps a probability mass, and both trim choices before sampling.