The picture
Imagine the loss as a hilly landscape over the parameters. Gradient descent walks downhill by repeatedly stepping in the direction that lowers the loss fastest.
- The gradient points in the direction of steepest increase.
- We step in the opposite direction, scaled by the learning rate.
- We repeat until the gradient is near zero.
The update rule
Each step computes the gradient of the loss with respect to every parameter, then moves each parameter a small amount against its gradient. Small steps trace a smooth path toward a minimum.
- A large step can overshoot the valley.
- A tiny step is safe but slow.
Where it goes
On a smooth surface the path curves toward a low point. The slope flattens as we approach a minimum, so steps naturally shrink near the bottom.
Gradient descent is the engine behind most model training, from linear regression to deep networks.
Key idea
Gradient descent minimizes a loss by repeatedly stepping against the gradient, letting the local slope guide each move toward a valley.