← Lessons

quiz vs the machine

Gold1450

Machine Learning

PR AUC for Imbalanced Data

Why the precision recall curve is the honest scoreboard when positives are rare.

5 min read · core · beat Gold to climb

A different curve

The precision recall curve plots precision against recall as the threshold sweeps. The area under it, PR AUC or average precision, summarizes the whole curve.

Why it suits imbalance

Both precision and recall focus on the positive class and ignore true negatives entirely. When negatives vastly outnumber positives, that is exactly what you want.

  • ROC AUC can look impressive while precision is terrible
  • PR AUC exposes the cost of false positives because precision sits in the picture

The baseline shifts

A crucial difference from ROC: the no skill baseline for PR AUC is not 0.5. It equals the positive prevalence.

  • If positives are 1 percent of data, random PR AUC is about 0.01
  • A PR AUC of 0.3 there is actually strong, not weak

Always compare PR AUC against the prevalence baseline, not against 0.5.

When to use which

Use ROC AUC for balanced data or when you care about ranking across both classes. Use PR AUC when positives are rare and false positives are expensive.

Key idea

PR AUC ignores true negatives and centers the rare positive class, making it the better summary under heavy imbalance. Judge it against prevalence, not 0.5.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the no skill baseline for PR AUC?

2. Why is PR AUC preferred over ROC AUC for rare positives?