← Lessons

quiz vs the machine

Gold1420

Machine Learning

Macro Micro and Weighted Averaging

Three ways to roll per class scores into one number, each telling a different story.

5 min read · core · beat Gold to climb

The problem

With many classes you get a precision, recall, and F1 per class. To report one number you must average. There are three common schemes.

The three averages

  • Macro computes the metric per class then takes a plain mean. Every class counts equally regardless of size
  • Micro pools all true positives, false positives, and false negatives across classes then computes once. Large classes dominate
  • Weighted averages per class metrics weighted by class support, a compromise

When they differ

On imbalanced data macro and micro diverge sharply.

  • Macro highlights poor performance on rare classes, since a tiny class counts as much as a huge one
  • Micro reflects overall instance level accuracy and equals accuracy in single label multiclass
  • Weighted sits between, biased toward frequent classes but not as extreme as micro

Choosing

If rare classes are important, report macro. If overall throughput accuracy matters, micro is fine. Always state which one you used, because the same model can look very different under each.

Key idea

Macro treats classes equally, micro treats instances equally, and weighted is a support based blend. Name your averaging scheme or the number is ambiguous.

Check yourself

Answer to earn rating on the learn ladder.

1. Which average gives a tiny rare class the same weight as a huge class?

2. In single label multiclass, micro averaged F1 equals what?