Top K and Top P Sampling

Decoding rules that trim a language model's choices before sampling the next token.

What it is

When a language model generates text it produces a probability for every possible next token. Top k and top p sampling are rules for trimming that list before randomly picking, balancing quality against variety.

How each works

Both methods restrict the candidate set, then sample from what remains.

Top k keeps the k most likely tokens and discards the rest, so k of forty considers forty candidates
Top p, also called nucleus sampling, keeps the smallest set of tokens whose probabilities add up to at least p, such as ninety five percent

The difference is that top k uses a fixed count while top p adapts. When the model is confident, top p keeps few tokens, and when it is uncertain, it keeps more.

Tuning behavior

These knobs work alongside temperature, which sharpens or flattens the distribution first.

A small k or p makes output more focused and repetitive
A large k or p makes output more varied but riskier
They prevent the model from ever picking a very low probability token by accident

Key idea

Top k keeps a fixed number of likely tokens while top p keeps a probability mass, and both trim choices before sampling.

Top K and Top P Sampling

What it is

How each works

Tuning behavior

Key idea

Check yourself