The Model Selection Criteria

The goal

Model selection chooses among candidate models or hyperparameters. The aim is the best performance on unseen data, not the lowest training error, which always favors the most complex model.

Held out estimates

A single validation set gives a quick estimate but is noisy on small data.
Cross validation rotates the validation fold across the data for a more stable estimate.
Keep a final test set untouched until the very end.

Penalizing complexity

For probabilistic models, information criteria balance fit against size.

The Akaike information criterion rewards fit and penalizes parameter count, aiming at prediction.
The Bayesian information criterion penalizes complexity more strongly as data grows, aiming at the true model.
Lower values are preferred under both.

Avoiding pitfalls

Tuning many choices on one split causes selection bias; use nested cross validation.
Prefer the simplest model within noise of the best, the one standard error rule.
Watch for leakage that makes validation look better than reality.

Key idea

Model selection targets generalization using cross validation and complexity penalties like the Akaike and Bayesian criteria, then confirms on an untouched test set while guarding against leakage.

The Model Selection Criteria

The goal

Held out estimates

Penalizing complexity

Avoiding pitfalls

Key idea

Check yourself