Smarter than guessing
Grid and random search ignore what they have already learned, repeating blind trials. When each training run is expensive, you want to use the results so far to pick the next promising setting. Bayesian optimization does exactly that.
The surrogate model
The method builds a cheap surrogate model that predicts performance across the hyperparameter space and how uncertain it is:
- Fit the surrogate, often a Gaussian process, to the trials done so far
- Use an acquisition function to choose the next point, balancing exploration of uncertain regions against exploitation of promising ones
- Run the real training there, observe the score, and update the surrogate
Over a handful of iterations the surrogate sharpens and steers the search toward strong configurations with far fewer expensive runs.
Tradeoffs
Bayesian optimization shines when evaluations are costly and the space is moderate. It is more complex and harder to parallelize than random search, and the surrogate can struggle in very high dimensions. Tools like successive halving combine it with early stopping for speed.
Key idea
Bayesian optimization fits a surrogate model of performance and uses an acquisition function to balance exploration and exploitation, finding good hyperparameters in few costly trials.