← Lessons

quiz vs the machine

Gold1470

Machine Learning

The One Cycle Policy

A single rise and fall of the learning rate for fast, well regularized training.

4 min read · core · beat Gold to climb

A schedule shaped like a hill

The one cycle policy runs the learning rate up from a low value to a high peak and then back down below the start, all within one training run. Momentum moves in the opposite direction, falling as the rate rises and rising again as it falls.

The two phases

  • Warmup the rate climbs to its peak, letting the model explore boldly and escape sharp minima.
  • Annealing the rate decays to near zero, settling into a flat, well generalizing minimum.

The schedule

Why it works

  • The high peak acts as regularization, discouraging the model from overfitting early.
  • The final low phase fine tunes into a stable solution.
  • It often reaches strong accuracy in far fewer epochs, an effect sometimes called superconvergence.

Practical notes

  • Find the peak with a learning rate finder, then set it as the cycle maximum.
  • Inverse momentum cycling keeps the effective step size smooth.
  • Best paired with a fixed epoch budget since the schedule spans the whole run.

Key idea

One cycle warms the learning rate up to a peak then anneals it below the start within a single run, with momentum cycled inversely. The high peak regularizes and the low tail settles into a good minimum.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the shape of the learning rate in the one cycle policy?

2. How does momentum behave during the one cycle schedule?