Why measure it
Feature importance tells you which inputs matter most. It guides feature selection, builds trust, and can reveal data leakage or bugs.
Tree based scores
Tree ensembles offer cheap built in importance.
- Split gain sums how much each feature reduced the loss across all its splits.
- Split count tallies how often a feature was used.
- These are fast but biased toward high cardinality features with many possible split points.
Permutation importance
A more honest, model agnostic method is permutation.
- Measure the baseline validation score.
- Shuffle one feature column to break its link to the target.
- The drop in score is that feature importance.
Because it tests the trained model on real data, it reflects predictive value, not just training time use.
Cautions
- Correlated features can share or hide importance, splitting credit between them.
- Importance shows association, not causation.
- For local per prediction explanations, use methods like SHAP values.
Key idea
Feature importance ranks inputs by their effect on predictions, and permutation importance gives an honest model agnostic measure by shuffling each feature.