The Learning Rate Effects

The single most important knob

The learning rate sets how far each gradient step moves. It is often the hyperparameter that most affects whether training succeeds.

Too high: steps overshoot the valley, loss oscillates or diverges.
Too low: progress is painfully slow and may stall in a plateau.
Just right: steady, fast descent toward a minimum.

What goes wrong

With a rate that is too large, each step can land on the far wall of the loss valley, bouncing higher each time. The loss curve climbs or swings wildly. With a rate too small, the loss curve flattens early and barely moves.

Tuning strategies

Try a range on a log scale, such as factors of ten.
Use a warmup to start small then grow.
Decay the rate over time so steps shrink near a minimum.

Adaptive optimizers like Adam adjust an effective rate per parameter, but a sensible base rate still matters.

Key idea

The learning rate controls step size: too large diverges, too small crawls, so tuning and scheduling it is central to making gradient descent converge well.

The Learning Rate Effects

The single most important knob

What goes wrong

Tuning strategies

Key idea

Check yourself