What it is
A decision tree predicts by asking a sequence of yes or no questions about the features. Each internal node tests one feature against a threshold, and each leaf gives a prediction. The path from root to leaf is a chain of simple rules.
How it learns
The tree grows greedily. At each node it searches for the split that best separates the data, measured by a purity criterion:
- Gini impurity or entropy for classification
- Variance reduction for regression
It keeps splitting until a stopping rule, like a maximum depth or a minimum number of samples per leaf, kicks in.
Strengths and weaknesses
- Easy to interpret and visualize
- Handles mixed feature types and needs little preprocessing
- Prone to overfitting if grown too deep, since a deep tree can memorize noise
Controlling complexity
Pruning and depth limits curb overfitting. A single tree is rarely best on its own, which motivates ensembles like random forests.
Key idea
A decision tree splits data with greedy purity based questions, is highly interpretable, and needs depth limits or pruning to avoid memorizing noise.