← Lessons

quiz vs the machine

Gold1400

Machine Learning

Outlier Detection and Treatment

Spot extreme values with statistical rules and decide whether to keep, cap, or remove them.

5 min read · core · beat Gold to climb

Outlier Detection and Treatment

An outlier is a value far from the bulk of the data. Some are errors, others are rare but genuine events, so detection and treatment require judgment.

Detecting outliers

  • Z score flags points more than a few standard deviations from the mean, suited to roughly normal data.
  • Interquartile range flags points below the first quartile or above the third quartile by more than one and a half times the IQR, which is robust to skew.
  • Model based methods like isolation forests score how isolated a point is in feature space.

Treating outliers

  • Remove them when they are clear data entry errors.
  • Cap or winsorize by clipping values to a percentile threshold.
  • Transform with a log to compress a long tail.
  • Keep them when they represent the rare cases you actually want to model.

Blindly deleting outliers can erase the very signal that matters, such as fraud or equipment failure. Always ask whether an extreme value is noise or information before acting.

Key idea

Detect outliers with z score, IQR, or model based methods, then decide to remove, cap, transform, or keep based on whether they are noise or genuine signal.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is the IQR rule preferred over the z score for skewed data?

2. What does winsorizing an outlier do?

3. When should you keep outliers rather than remove them?