Feature Selection Methods
Not every feature helps. Feature selection trims the input set to reduce overfitting, speed training, and improve interpretability by keeping only informative columns.
Three families
- Filter methods rank features by a statistic independent of any model, such as correlation, mutual information, or a chi squared test. They are fast but ignore feature interactions.
- Wrapper methods train a model on different subsets and keep the best, as in recursive feature elimination. They are accurate but expensive.
- Embedded methods select during training, like L1 regularization driving coefficients to zero or tree importances ranking features.
Practical guidance
- Remove redundant features that are highly correlated with each other.
- Watch for selection done on the whole dataset, which leaks test information; selection belongs inside cross validation.
- More features is not better. Irrelevant columns add noise and dilute signal.
Filter methods make a fast first pass, while embedded methods often give the best accuracy to cost tradeoff.
Key idea
Feature selection uses filter, wrapper, and embedded methods to keep informative columns, and the selection itself must stay inside cross validation to avoid leakage.