The Gradient Descent Intuition

The picture

Imagine the loss as a hilly landscape over the parameters. Gradient descent walks downhill by repeatedly stepping in the direction that lowers the loss fastest.

The gradient points in the direction of steepest increase.
We step in the opposite direction, scaled by the learning rate.
We repeat until the gradient is near zero.

The update rule

Each step computes the gradient of the loss with respect to every parameter, then moves each parameter a small amount against its gradient. Small steps trace a smooth path toward a minimum.

A large step can overshoot the valley.
A tiny step is safe but slow.

Where it goes

On a smooth surface the path curves toward a low point. The slope flattens as we approach a minimum, so steps naturally shrink near the bottom.

Gradient descent is the engine behind most model training, from linear regression to deep networks.

Key idea

Gradient descent minimizes a loss by repeatedly stepping against the gradient, letting the local slope guide each move toward a valley.

The Gradient Descent Intuition

The picture

The update rule

Where it goes

Key idea

Check yourself