Feature Importance from Trees
Tree models can rank inputs by how useful they are. Feature importance turns the splitting process into a score that says which features the model relied on.
Impurity based importance
The classic method sums the impurity reduction each feature delivers across all splits where it is used, weighted by how many samples pass through those splits. A feature used in high impact splits near the root scores high.
A known bias
Impurity importance favors features with many possible split points, such as high cardinality or continuous variables. Such features get more chances to look useful even by chance, so the ranking can be misleading.
Permutation importance
A more reliable method shuffles one feature's values and measures how much accuracy drops.
- If shuffling a feature wrecks accuracy, the model truly depended on it.
- If shuffling barely changes accuracy, the feature was not important.
- Permutation works on any model and avoids the cardinality bias.
Reading importance carefully
Importance shows what the model used, not true causation. Correlated features can share or split importance in confusing ways, so scores are a hint, not proof of mechanism.
Key idea
Tree importance ranks features by impurity reduction or by accuracy lost when a feature is shuffled.