The goal
Model selection chooses among candidate models or hyperparameters. The aim is the best performance on unseen data, not the lowest training error, which always favors the most complex model.
Held out estimates
- A single validation set gives a quick estimate but is noisy on small data.
- Cross validation rotates the validation fold across the data for a more stable estimate.
- Keep a final test set untouched until the very end.
Penalizing complexity
For probabilistic models, information criteria balance fit against size.
- The Akaike information criterion rewards fit and penalizes parameter count, aiming at prediction.
- The Bayesian information criterion penalizes complexity more strongly as data grows, aiming at the true model.
- Lower values are preferred under both.
Avoiding pitfalls
- Tuning many choices on one split causes selection bias; use nested cross validation.
- Prefer the simplest model within noise of the best, the one standard error rule.
- Watch for leakage that makes validation look better than reality.
Key idea
Model selection targets generalization using cross validation and complexity penalties like the Akaike and Bayesian criteria, then confirms on an untouched test set while guarding against leakage.