← Lessons

quiz vs the machine

Silver1120

Machine Learning

The Learning Rate Schedule

Why a single fixed step size rarely trains a model well.

4 min read · intro · beat Silver to climb

The Learning Rate Schedule

The learning rate sets how big each update step is. A schedule changes that step size over training instead of holding it fixed, which usually trains faster and lands lower.

The tradeoff

  • A large rate moves quickly but can overshoot and bounce.
  • A small rate is stable but painfully slow and may stall.
  • No single value is ideal for the whole run.

Common schedules

A typical plan starts with a moderate rate to make fast early progress, then decays it so the model can settle gently into a minimum. Step decay drops the rate at fixed milestones. Exponential decay shrinks it smoothly. Many modern runs combine a short warmup with a slow cosine decline.

Why decay helps

Early in training the parameters are far from any good region, so big steps pay off. Later the model is near a minimum, where big steps would just rattle around it. Shrinking the rate lets early speed and late precision both happen in one run.

Key idea

A learning rate schedule starts large for speed and decays for precision, combining fast progress with a gentle landing.

Check yourself

Answer to earn rating on the learn ladder.

1. Why decay the learning rate over training?

2. A learning rate that is too large tends to