Stop guessing the learning rate
The learning rate is the single most important hyperparameter, yet many tune it by trial and error. The learning rate finder discovers a good range in a single short run.
The procedure
Start with a tiny learning rate and increase it exponentially every batch for a few hundred steps, recording the loss each time. Plot loss against the rising learning rate on a log scale.
Reading the curve
Choosing the value
- The loss is flat when the rate is too small to make progress.
- It drops steeply through the useful range.
- It shoots up once the rate is too large and training diverges.
- A good choice sits on the steepest descent, often about one order of magnitude below the minimum.
Practical notes
- The run is cheap, just a few hundred batches.
- Rerun it if you change architecture, batch size, or optimizer.
- It pairs naturally with schedules like one cycle that need a sensible peak rate.
Key idea
The learning rate finder sweeps the rate exponentially and plots loss to reveal the useful band. Pick a value on the steepest part of the descent, below where loss starts to diverge.