← Lessons

quiz vs the machine

Platinum1800

Machine Learning

Perplexity For Language Models

How surprised a model is by real text, and why lower is better.

5 min read · advanced · beat Platinum to climb

Measuring surprise

Perplexity evaluates how well a language model predicts a text. Intuitively it measures how surprised the model is by the actual next words. A lower perplexity means the model assigned high probability to what really came next.

From cross entropy to perplexity

Perplexity is the exponential of the average per word cross entropy. Because of that link, it can be read as an effective branching factor.

  • A perplexity of one means perfect prediction with no surprise.
  • A perplexity of fifty means the model is as uncertain as choosing uniformly among fifty options at each step.
  • Lower perplexity means a tighter, more confident model.

Cautions

  • Perplexity depends on the tokenization, so scores only compare across models with the same vocabulary.
  • It rewards fluency and probability, not factual correctness.
  • A model can have low perplexity yet still produce confident falsehoods.

Key idea

Perplexity is the exponential of average cross entropy, an effective branching factor showing how surprised a model is by real text. Lower is better, but it measures fluency and probability, not truth.

Check yourself

Answer to earn rating on the learn ladder.

1. A lower perplexity indicates what about a language model?

2. Why can perplexity scores be unfair to compare across models?