The One Cycle Policy

A single rise and fall of the learning rate for fast, well regularized training.

A schedule shaped like a hill

The one cycle policy runs the learning rate up from a low value to a high peak and then back down below the start, all within one training run. Momentum moves in the opposite direction, falling as the rate rises and rising again as it falls.

The two phases

Warmup the rate climbs to its peak, letting the model explore boldly and escape sharp minima.
Annealing the rate decays to near zero, settling into a flat, well generalizing minimum.

The schedule

Why it works

The high peak acts as regularization, discouraging the model from overfitting early.
The final low phase fine tunes into a stable solution.
It often reaches strong accuracy in far fewer epochs, an effect sometimes called superconvergence.

Practical notes

Find the peak with a learning rate finder, then set it as the cycle maximum.
Inverse momentum cycling keeps the effective step size smooth.
Best paired with a fixed epoch budget since the schedule spans the whole run.

Key idea

One cycle warms the learning rate up to a peak then anneals it below the start within a single run, with momentum cycled inversely. The high peak regularizes and the low tail settles into a good minimum.