← Lessons

quiz vs the machine

Platinum1850

Machine Learning

The Model Selection Criteria

Pick the model that will generalize, not the one that memorized.

6 min read · advanced · beat Platinum to climb

The goal

Model selection chooses among candidate models or hyperparameters. The aim is the best performance on unseen data, not the lowest training error, which always favors the most complex model.

Held out estimates

  • A single validation set gives a quick estimate but is noisy on small data.
  • Cross validation rotates the validation fold across the data for a more stable estimate.
  • Keep a final test set untouched until the very end.

Penalizing complexity

For probabilistic models, information criteria balance fit against size.

  • The Akaike information criterion rewards fit and penalizes parameter count, aiming at prediction.
  • The Bayesian information criterion penalizes complexity more strongly as data grows, aiming at the true model.
  • Lower values are preferred under both.

Avoiding pitfalls

  • Tuning many choices on one split causes selection bias; use nested cross validation.
  • Prefer the simplest model within noise of the best, the one standard error rule.
  • Watch for leakage that makes validation look better than reality.

Key idea

Model selection targets generalization using cross validation and complexity penalties like the Akaike and Bayesian criteria, then confirms on an untouched test set while guarding against leakage.

Check yourself

Answer to earn rating on the learn ladder.

1. Why not select a model by lowest training error?

2. How does the Bayesian criterion differ from the Akaike criterion?

3. What does nested cross validation guard against?