← Lessons

quiz vs the machine

Gold1340

Machine Learning

Hyperparameter Search Strategies

Grid, random, and Bayesian methods for tuning learning rate, depth, and more.

6 min read · core · beat Gold to climb

The task

Hyperparameters are settings you choose before training, like learning rate, batch size, and number of layers. Searching for good values is expensive because each trial is a full or partial training run.

Three core strategies

  • Grid search tries every combination on a fixed grid. It is simple but the number of trials explodes with each added dimension, the curse of dimensionality.
  • Random search samples combinations at random. It often beats grid search because only a few hyperparameters truly matter, and random sampling explores those important axes more finely.
  • Bayesian optimization builds a model of the objective from past trials and picks the next point expected to improve most. It needs fewer trials but adds bookkeeping and assumes a fairly smooth objective.

Early stopping the search

Successive halving and Hyperband run many trials briefly, kill the worst, and give survivors more budget. This focuses compute on promising configurations instead of training every trial to the end.

Key idea

Random search usually beats grid search because few hyperparameters matter, Bayesian optimization spends trials more wisely, and halving methods stop weak trials early to save compute.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does random search often beat grid search?

2. What does Bayesian optimization use to pick the next trial?

3. What do successive halving and Hyperband do?