Decision Tree Splitting Criteria
A decision tree classifies or predicts by asking a sequence of yes or no questions. Each internal node tests one feature against a threshold, and training is the search for good questions.
Greedy splitting
At every node the algorithm tries many candidate splits and keeps the one that best separates the data. It is greedy because it picks the locally best split without looking ahead to future splits.
- For a numeric feature it sorts the values and tests thresholds between adjacent points.
- For a categorical feature it tries groupings of the categories.
- It scores each candidate with an impurity measure and selects the largest improvement.
What good means
A split is good when the resulting child nodes are purer than the parent, meaning each child holds mostly one class or a tight range of values. The chosen criterion, such as Gini or entropy for classification, turns purity into a single number the tree can maximize.
Recursion and stopping
The tree applies the same procedure to each child, splitting again and again. It stops when a node is pure, too small, or a depth limit is hit, becoming a leaf that outputs a prediction.
Key idea
A decision tree grows by greedily choosing the split that most reduces impurity at each node.