Bayesian Optimization For Tuning
Bayesian optimization tunes hyperparameters by treating validation score as an expensive unknown function and learning a probabilistic model of it. Each new trial is chosen to be maximally informative, so it needs far fewer evaluations than grid or random search.
The loop
- Fit a surrogate model, often a Gaussian process, that predicts the score and its uncertainty across the hyperparameter space.
- Use an acquisition function to pick the next point, balancing exploitation of promising regions against exploration of uncertain ones.
- Evaluate that point, update the surrogate, and repeat.
Why it is efficient
- It uses past results to decide where to look next, unlike grid or random search which ignore history.
- The acquisition function formalizes the explore exploit tradeoff.
- It shines when each evaluation is costly, such as training large models.
Caveats
- The surrogate adds overhead, so it is less worthwhile when evaluations are cheap.
- It is more sequential, which limits parallelism compared to random search.
- Performance depends on sensible search ranges and a suitable surrogate.
Key idea
Bayesian optimization models the score with a surrogate and uses an acquisition function to choose informative trials, finding good hyperparameters in far fewer expensive evaluations.