Pruning Decision Trees
A fully grown decision tree can split until every leaf is pure, which usually means it has memorized noise. Pruning trims the tree so it generalizes to new data instead of fitting every quirk.
Why overgrown trees fail
Deep leaves often hold only a handful of training points, so their predictions reflect random fluctuations rather than real structure. The result is high variance and poor test accuracy.
Pre pruning
Pre pruning stops growth early using limits such as a maximum depth, a minimum number of samples to split, or a minimum impurity decrease. It is simple but can stop too soon and miss useful splits hidden beneath weak ones.
Post pruning
Post pruning grows a full tree, then removes branches that do not help. Cost complexity pruning adds a penalty for each leaf and removes subtrees whose accuracy gain does not justify their size. The penalty strength is tuned with validation data.
- Pre pruning is faster but can underfit.
- Post pruning is more thorough because it sees the whole tree first.
Choosing the amount
Cross validation picks how aggressively to prune, balancing a tree that is too simple against one that is too complex.
Key idea
Pruning removes branches that fit noise so the tree trades training fit for better generalization.