Bayesian Hyperparameter Optimization

Smarter than guessing

Grid and random search ignore what they have already learned, repeating blind trials. When each training run is expensive, you want to use the results so far to pick the next promising setting. Bayesian optimization does exactly that.

The surrogate model

The method builds a cheap surrogate model that predicts performance across the hyperparameter space and how uncertain it is:

Fit the surrogate, often a Gaussian process, to the trials done so far
Use an acquisition function to choose the next point, balancing exploration of uncertain regions against exploitation of promising ones
Run the real training there, observe the score, and update the surrogate

Over a handful of iterations the surrogate sharpens and steers the search toward strong configurations with far fewer expensive runs.

Tradeoffs

Bayesian optimization shines when evaluations are costly and the space is moderate. It is more complex and harder to parallelize than random search, and the surrogate can struggle in very high dimensions. Tools like successive halving combine it with early stopping for speed.

Key idea

Bayesian optimization fits a surrogate model of performance and uses an acquisition function to balance exploration and exploitation, finding good hyperparameters in few costly trials.

Bayesian Hyperparameter Optimization

Smarter than guessing

The surrogate model

Tradeoffs

Key idea

Check yourself