← Lessons

quiz vs the machine

Gold1410

Machine Learning

Anomaly Detection With Isolation Forest

Spotting outliers as points that are easy to isolate with random splits.

5 min read · core · beat Gold to climb

Anomaly Detection With Isolation Forest

The isolation forest detects anomalies with a clever idea: outliers are few and different, so they are easy to isolate. The method needs no labels and scales well to large datasets.

Isolating points with random trees

The algorithm builds many random trees. To build one tree it repeatedly:

  • picks a random feature, and
  • picks a random split value between that feature minimum and maximum.

This partitions the data until each point lands alone in a leaf. The path length from the root to a point is how many splits it took to isolate it.

Anomaly score from path length

Normal points sit in dense regions and need many splits to isolate, giving long paths. Anomalies sit apart and get separated quickly, giving short paths. Averaging path lengths across the whole forest produces an anomaly score: short average path means likely anomaly.

Why it works well

  • It targets anomalies directly rather than modeling normal data in full.
  • Its cost grows roughly linearly with the number of points.
  • It handles many features without computing distances.

A small contamination parameter sets the threshold for how many points to flag.

Key idea

An isolation forest flags anomalies by random partitioning, since outliers are isolated in few splits and have short average path lengths.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are anomalies isolated quickly in an isolation forest?

2. What signals an anomaly in this method?

3. What is a scaling advantage of the isolation forest?