The F1 Score

Why combine two metrics

Precision and recall each tell only half the story, and they trade off against each other. Reporting both is honest but awkward when you need to rank models or tune a threshold. The F1 score folds them into a single value.

The harmonic mean

F1 is the harmonic mean of precision and recall, not the ordinary average. The harmonic mean is dominated by the smaller of the two numbers:

If precision is high but recall is near zero, F1 stays near zero
F1 is only high when both precision and recall are high
This punishes models that cheat one metric while ignoring the other

Variants

The general F beta score weights recall more heavily when beta is above one, useful when missing positives is costly, and weights precision more when beta is below one. On imbalanced data F1 is far more informative than plain accuracy.

Key idea

The F1 score is the harmonic mean of precision and recall, rewarding models only when both are strong, with F beta tilting the balance toward one or the other.

Why combine two metrics

The harmonic mean

Variants

Key idea

Check yourself