← Lessons

quiz vs the machine

Gold1350

Machine Learning

The Learning Rate Effects

Too high diverges, too low crawls, so the step size makes or breaks training.

5 min read · core · beat Gold to climb

The single most important knob

The learning rate sets how far each gradient step moves. It is often the hyperparameter that most affects whether training succeeds.

  • Too high: steps overshoot the valley, loss oscillates or diverges.
  • Too low: progress is painfully slow and may stall in a plateau.
  • Just right: steady, fast descent toward a minimum.

What goes wrong

With a rate that is too large, each step can land on the far wall of the loss valley, bouncing higher each time. The loss curve climbs or swings wildly. With a rate too small, the loss curve flattens early and barely moves.

Tuning strategies

  • Try a range on a log scale, such as factors of ten.
  • Use a warmup to start small then grow.
  • Decay the rate over time so steps shrink near a minimum.

Adaptive optimizers like Adam adjust an effective rate per parameter, but a sensible base rate still matters.

Key idea

The learning rate controls step size: too large diverges, too small crawls, so tuning and scheduling it is central to making gradient descent converge well.

Check yourself

Answer to earn rating on the learn ladder.

1. What typically happens with a learning rate that is far too high?

2. Why decay the learning rate over training?

3. A loss curve that flattens almost immediately and barely moves suggests what?