← Lessons

quiz vs the machine

Gold1340

Machine Learning

Cross Validation K Fold

Estimate generalization by rotating which data slice serves as validation.

4 min read · core · beat Gold to climb

Cross Validation K Fold

K fold cross validation estimates how well a model generalizes by splitting the data into k equal parts and rotating which part is held out for validation.

The procedure

  • Split the training data into k folds of roughly equal size.
  • For each fold, train on the other k minus one folds and validate on the held out fold.
  • Average the k validation scores to get a robust performance estimate.

Why it helps

  • A single split can be lucky or unlucky. Averaging k results gives a lower variance estimate.
  • Every example is used for validation exactly once and for training k minus one times.
  • It uses data efficiently, which matters most on small datasets.

Practical tips

  • Common choices are 5 or 10 folds, trading cost against estimate stability.
  • Use stratified folds for classification so class ratios stay balanced.
  • Keep a separate test set untouched. Cross validation tunes choices, the test set gives the final unbiased number.

Key idea

K fold cross validation rotates the validation slice across k folds and averages the scores, giving a lower variance, data efficient estimate of generalization.

Check yourself

Answer to earn rating on the learn ladder.

1. In 5 fold cross validation, how many times is each example used for validation?

2. Why use stratified folds for classification?

3. What is the main benefit over a single split?