← Lessons

quiz vs the machine

Gold1360

Machine Learning

The Learning Rate Finder

Sweeping the learning rate to read a good value straight off the loss curve.

4 min read · core · beat Gold to climb

Stop guessing the learning rate

The learning rate is the single most important hyperparameter, yet many tune it by trial and error. The learning rate finder discovers a good range in a single short run.

The procedure

Start with a tiny learning rate and increase it exponentially every batch for a few hundred steps, recording the loss each time. Plot loss against the rising learning rate on a log scale.

Reading the curve

Choosing the value

  • The loss is flat when the rate is too small to make progress.
  • It drops steeply through the useful range.
  • It shoots up once the rate is too large and training diverges.
  • A good choice sits on the steepest descent, often about one order of magnitude below the minimum.

Practical notes

  • The run is cheap, just a few hundred batches.
  • Rerun it if you change architecture, batch size, or optimizer.
  • It pairs naturally with schedules like one cycle that need a sensible peak rate.

Key idea

The learning rate finder sweeps the rate exponentially and plots loss to reveal the useful band. Pick a value on the steepest part of the descent, below where loss starts to diverge.

Check yourself

Answer to earn rating on the learn ladder.

1. How does the learning rate finder vary the rate during its run?

2. Where on the loss versus learning rate curve should you choose a value?