The problem with one split
A single validation split is noisy. If that one slice happens to be easy or hard, your estimate of model quality swings. With small datasets this noise can be large.
The rotation idea
K fold cross validation divides the data into k equal parts called folds. You train k times, each time holding out a different fold for validation and training on the rest.
- Every row gets used for validation exactly once.
- Every row gets used for training k minus one times.
- The final score is the average across the k folds.
Choosing k
- Five or ten folds are the usual choices.
- Larger k means more training runs but a less biased estimate.
- The extreme case where k equals the number of rows is called leave one out, which is thorough but expensive.
What it buys you
- A more stable estimate of performance with a sense of its variance.
- Efficient use of limited data, since nothing is permanently set aside for validation.
- Note the test set still stays separate for the final report.
Key idea
K fold cross validation rotates the held out fold so every row validates once, giving a stable averaged estimate from limited data.