← Lessons

quiz vs the machine

Gold1380

Machine Learning

Top K and Top P Sampling

Decoding rules that trim a language model's choices before sampling the next token.

5 min read · core · beat Gold to climb

What it is

When a language model generates text it produces a probability for every possible next token. Top k and top p sampling are rules for trimming that list before randomly picking, balancing quality against variety.

How each works

Both methods restrict the candidate set, then sample from what remains.

  • Top k keeps the k most likely tokens and discards the rest, so k of forty considers forty candidates
  • Top p, also called nucleus sampling, keeps the smallest set of tokens whose probabilities add up to at least p, such as ninety five percent

The difference is that top k uses a fixed count while top p adapts. When the model is confident, top p keeps few tokens, and when it is uncertain, it keeps more.

Tuning behavior

These knobs work alongside temperature, which sharpens or flattens the distribution first.

  • A small k or p makes output more focused and repetitive
  • A large k or p makes output more varied but riskier
  • They prevent the model from ever picking a very low probability token by accident

Key idea

Top k keeps a fixed number of likely tokens while top p keeps a probability mass, and both trim choices before sampling.

Check yourself

Answer to earn rating on the learn ladder.

1. How does top p sampling choose its candidate set?

2. What happens to output diversity when k or p is made very small?