What it is
Gradient descent is the optimization algorithm that powers most machine learning training. The goal is to find model parameters that minimize a loss function, a number that measures how wrong the predictions are.
The intuition
Imagine standing on a foggy hillside trying to reach the valley. You cannot see far, but you can feel which way is downhill. You take a small step in that direction, then repeat. The gradient is the mathematical version of "which way is downhill" for the loss.
The update rule
At each step the algorithm:
- Computes the gradient of the loss with respect to each parameter
- Moves each parameter a little in the opposite direction
- The size of the step is controlled by the learning rate
A learning rate too large overshoots the valley and may diverge. Too small and training crawls. Variants like stochastic gradient descent use small random batches of data so each step is fast and noisy but cheap.
Key idea
Gradient descent repeatedly nudges parameters in the direction that most reduces loss, with step size set by the learning rate.