The Learning Rate Finder

Stop guessing the learning rate

The learning rate is the single most important hyperparameter, yet many tune it by trial and error. The learning rate finder discovers a good range in a single short run.

The procedure

Start with a tiny learning rate and increase it exponentially every batch for a few hundred steps, recording the loss each time. Plot loss against the rising learning rate on a log scale.

Reading the curve

Choosing the value

The loss is flat when the rate is too small to make progress.
It drops steeply through the useful range.
It shoots up once the rate is too large and training diverges.
A good choice sits on the steepest descent, often about one order of magnitude below the minimum.

Practical notes

The run is cheap, just a few hundred batches.
Rerun it if you change architecture, batch size, or optimizer.
It pairs naturally with schedules like one cycle that need a sensible peak rate.

Key idea

The learning rate finder sweeps the rate exponentially and plots loss to reveal the useful band. Pick a value on the steepest part of the descent, below where loss starts to diverge.