The Decision Tree Pruning Recap

Trimming an overgrown tree with pre pruning limits and cost complexity post pruning.

Trees overfit by growing too deep

A decision tree can keep splitting until every leaf is pure, memorizing the training data. Pruning controls this complexity so the tree generalizes.

Two pruning styles

Pre pruning stops growth early using limits like maximum depth, minimum samples per leaf, or a minimum impurity decrease to allow a split.
Post pruning grows a full tree, then collapses branches that do not help on held out data.

Cost complexity pruning

The most common post pruning method adds a penalty proportional to the number of leaves. A tuning parameter called alpha controls the tradeoff. Raising alpha removes more branches, producing a sequence of nested subtrees from which cross validation picks the best.

Why prune

A pruned tree has lower variance and reads more clearly.
Pre pruning is cheaper but can stop too early, missing a good later split.
Post pruning is more reliable because it judges branches by actual validation gain.

Key idea

Pre pruning halts growth early with depth and sample limits, while post pruning grows a full tree then trims weak branches via cost complexity, tuning alpha by cross validation to lower variance.

The Decision Tree Pruning Recap

Trees overfit by growing too deep

Two pruning styles

Cost complexity pruning

Why prune

Key idea

Check yourself