← Lessons

quiz vs the machine

Gold1430

Machine Learning

The Class Imbalance Handling

When one class is rare, level the field so the model still learns it.

5 min read · core · beat Gold to climb

The problem

When one class hugely outnumbers another, a model can score high accuracy by ignoring the rare class entirely. Class imbalance wrecks the metric you usually care about, detecting the rare event.

Resampling

  • Oversampling duplicates or synthesizes minority examples, as with synthetic interpolation methods.
  • Undersampling drops majority examples to balance counts, risking lost information.
  • Resample only the training split to avoid leaking into evaluation.

Reweighting

  • Give the minority class a larger loss weight so its mistakes cost more.
  • Many algorithms accept class weights directly, no resampling needed.

Metrics and thresholds

  • Track precision, recall, and the area under the precision recall curve, not raw accuracy.
  • Tune the decision threshold toward the rare class.

Key idea

Class imbalance lets a model ignore the rare class. Resampling, class reweighting, and threshold tuning restore balance, judged by precision and recall rather than accuracy.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does accuracy mislead under heavy class imbalance?

2. What does class reweighting do?

3. Where should resampling be applied?