Beyond accuracy
Accuracy can be misleading when classes are imbalanced. If only one percent of emails are spam, a model that calls everything not spam is ninety nine percent accurate yet useless. Precision and recall give a sharper picture.
The definitions
- Precision is, of the items the model flagged positive, how many were actually positive. It answers how trustworthy a positive prediction is.
- Recall is, of all the truly positive items, how many the model caught. It answers how complete the model is.
The tradeoff
Raising the decision threshold makes the model cautious, boosting precision but lowering recall. Lowering it catches more positives, boosting recall but hurting precision. The F1 score is the harmonic mean of the two, useful when you want a single balanced number.
Which to favor
In spam filtering, precision protects real mail. In disease screening, recall protects patients by catching cases even at the cost of false alarms.
Key idea
Precision measures how trustworthy positive predictions are while recall measures how many real positives are caught, and the threshold trades one for the other.