← Lessons

quiz vs the machine

Gold1340

Machine Learning

The Decision Tree Pruning Recap

Trimming an overgrown tree with pre pruning limits and cost complexity post pruning.

4 min read · core · beat Gold to climb

Trees overfit by growing too deep

A decision tree can keep splitting until every leaf is pure, memorizing the training data. Pruning controls this complexity so the tree generalizes.

Two pruning styles

  • Pre pruning stops growth early using limits like maximum depth, minimum samples per leaf, or a minimum impurity decrease to allow a split.
  • Post pruning grows a full tree, then collapses branches that do not help on held out data.

Cost complexity pruning

The most common post pruning method adds a penalty proportional to the number of leaves. A tuning parameter called alpha controls the tradeoff. Raising alpha removes more branches, producing a sequence of nested subtrees from which cross validation picks the best.

Why prune

  • A pruned tree has lower variance and reads more clearly.
  • Pre pruning is cheaper but can stop too early, missing a good later split.
  • Post pruning is more reliable because it judges branches by actual validation gain.

Key idea

Pre pruning halts growth early with depth and sample limits, while post pruning grows a full tree then trims weak branches via cost complexity, tuning alpha by cross validation to lower variance.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the alpha parameter in cost complexity pruning control?

2. What is a drawback of pre pruning compared with post pruning?