The Class Imbalance Handling

The problem

When one class hugely outnumbers another, a model can score high accuracy by ignoring the rare class entirely. Class imbalance wrecks the metric you usually care about, detecting the rare event.

Resampling

Oversampling duplicates or synthesizes minority examples, as with synthetic interpolation methods.
Undersampling drops majority examples to balance counts, risking lost information.
Resample only the training split to avoid leaking into evaluation.

Reweighting

Give the minority class a larger loss weight so its mistakes cost more.
Many algorithms accept class weights directly, no resampling needed.

Metrics and thresholds

Track precision, recall, and the area under the precision recall curve, not raw accuracy.
Tune the decision threshold toward the rare class.

Key idea

Class imbalance lets a model ignore the rare class. Resampling, class reweighting, and threshold tuning restore balance, judged by precision and recall rather than accuracy.

The Class Imbalance Handling

The problem

Resampling

Reweighting

Metrics and thresholds

Key idea

Check yourself