← Lessons

quiz vs the machine

Platinum1700

Machine Learning

Bayesian Hyperparameter Optimization

Using past trials to decide which hyperparameters to test next.

5 min read · advanced · beat Platinum to climb

Smarter than guessing

Grid and random search ignore what they have already learned, repeating blind trials. When each training run is expensive, you want to use the results so far to pick the next promising setting. Bayesian optimization does exactly that.

The surrogate model

The method builds a cheap surrogate model that predicts performance across the hyperparameter space and how uncertain it is:

  • Fit the surrogate, often a Gaussian process, to the trials done so far
  • Use an acquisition function to choose the next point, balancing exploration of uncertain regions against exploitation of promising ones
  • Run the real training there, observe the score, and update the surrogate

Over a handful of iterations the surrogate sharpens and steers the search toward strong configurations with far fewer expensive runs.

Tradeoffs

Bayesian optimization shines when evaluations are costly and the space is moderate. It is more complex and harder to parallelize than random search, and the surrogate can struggle in very high dimensions. Tools like successive halving combine it with early stopping for speed.

Key idea

Bayesian optimization fits a surrogate model of performance and uses an acquisition function to balance exploration and exploitation, finding good hyperparameters in few costly trials.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the acquisition function balance?

2. When is Bayesian optimization most worthwhile?