Hyperparameter Search Strategies

The task

Hyperparameters are settings you choose before training, like learning rate, batch size, and number of layers. Searching for good values is expensive because each trial is a full or partial training run.

Three core strategies

Grid search tries every combination on a fixed grid. It is simple but the number of trials explodes with each added dimension, the curse of dimensionality.
Random search samples combinations at random. It often beats grid search because only a few hyperparameters truly matter, and random sampling explores those important axes more finely.
Bayesian optimization builds a model of the objective from past trials and picks the next point expected to improve most. It needs fewer trials but adds bookkeeping and assumes a fairly smooth objective.

Early stopping the search

Successive halving and Hyperband run many trials briefly, kill the worst, and give survivors more budget. This focuses compute on promising configurations instead of training every trial to the end.

Key idea

Random search usually beats grid search because few hyperparameters matter, Bayesian optimization spends trials more wisely, and halving methods stop weak trials early to save compute.

Hyperparameter Search Strategies

The task

Three core strategies

Early stopping the search

Key idea

Check yourself