Learning Rate Intuition
The learning rate is a single number that controls how far the optimizer steps each time it updates the parameters. It is often the most important knob to tune, and small changes can transform training.
Think of descending a hill in fog. The gradient tells you which way is downhill. The learning rate decides how big a stride you take in that direction.
The extremes both fail:
- A learning rate that is too small takes tiny steps, so training crawls and may never reach a good solution in reasonable time
- A learning rate that is too large overshoots the valley, bouncing across it or even climbing as the loss diverges
A well chosen rate moves quickly when far from a minimum and settles smoothly as it nears one. Because the ideal value changes during training, practitioners often use a schedule that starts higher and decays over time, or adaptive optimizers that adjust the effective step per parameter.
A useful diagnostic is the loss curve. Smooth steady decrease suggests a healthy rate. Wild oscillation suggests it is too high. A nearly flat line suggests it is too low.
Key idea
The learning rate scales each update; too small crawls, too large diverges, and a good value or schedule balances speed with stability.