Gradient Descent

What it is

Gradient descent is the optimization algorithm that powers most machine learning training. The goal is to find model parameters that minimize a loss function, a number that measures how wrong the predictions are.

The intuition

Imagine standing on a foggy hillside trying to reach the valley. You cannot see far, but you can feel which way is downhill. You take a small step in that direction, then repeat. The gradient is the mathematical version of "which way is downhill" for the loss.

The update rule

At each step the algorithm:

Computes the gradient of the loss with respect to each parameter
Moves each parameter a little in the opposite direction
The size of the step is controlled by the learning rate

A learning rate too large overshoots the valley and may diverge. Too small and training crawls. Variants like stochastic gradient descent use small random batches of data so each step is fast and noisy but cheap.

Key idea

Gradient descent repeatedly nudges parameters in the direction that most reduces loss, with step size set by the learning rate.

What it is

The intuition

The update rule

Key idea

Check yourself