Bayesian Optimization For Tuning

Build a probabilistic model of the score and choose the next trial intelligently.

Bayesian Optimization For Tuning

Bayesian optimization tunes hyperparameters by treating validation score as an expensive unknown function and learning a probabilistic model of it. Each new trial is chosen to be maximally informative, so it needs far fewer evaluations than grid or random search.

The loop

Fit a surrogate model, often a Gaussian process, that predicts the score and its uncertainty across the hyperparameter space.
Use an acquisition function to pick the next point, balancing exploitation of promising regions against exploration of uncertain ones.
Evaluate that point, update the surrogate, and repeat.

Why it is efficient

It uses past results to decide where to look next, unlike grid or random search which ignore history.
The acquisition function formalizes the explore exploit tradeoff.
It shines when each evaluation is costly, such as training large models.

Caveats

The surrogate adds overhead, so it is less worthwhile when evaluations are cheap.
It is more sequential, which limits parallelism compared to random search.
Performance depends on sensible search ranges and a suitable surrogate.

Key idea

Bayesian optimization models the score with a surrogate and uses an acquisition function to choose informative trials, finding good hyperparameters in far fewer expensive evaluations.

Bayesian Optimization For Tuning