← Lessons

quiz vs the machine

Gold1380

Machine Learning

Perplexity

A standard score for how well a language model predicts text.

4 min read · core · beat Gold to climb

What it measures

Perplexity measures how surprised a language model is by a piece of text. A lower perplexity means the model assigned higher probability to the actual words, so it predicted better.

The intuition

Think of perplexity as the average number of equally likely choices the model felt it had at each step. If a model is perfectly confident and correct, perplexity approaches one. If it guesses uniformly among a vocabulary, perplexity equals the vocabulary size.

  • It is computed from the model's probability for each true token
  • It is the exponential of the average negative log probability
  • Lower is better

Cautions

Perplexity only compares models that share the same tokenizer and vocabulary, since the unit of prediction changes the number. It also measures prediction quality, not usefulness. A model can have low perplexity yet still be unhelpful or produce unsafe text, which is why task based evaluation remains essential.

Key idea

Perplexity is the exponential of the average negative log probability of the true tokens, where lower means the model predicts text better.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a lower perplexity indicate?

2. When is comparing perplexity between two models fair?