← Lessons

quiz vs the machine

Platinum1800

Machine Learning

Backpropagation

The chain rule applied to efficiently train neural networks.

6 min read · advanced · beat Platinum to climb

The problem it solves

A neural network has many layers and thousands or millions of weights. To train it with gradient descent, you need the gradient of the loss with respect to every weight. Backpropagation computes all of them efficiently in one backward sweep.

Forward then backward

Training each batch has two phases:

  • The forward pass feeds inputs through the layers to produce a prediction and a loss
  • The backward pass applies the chain rule from calculus, propagating the error gradient from the output back through each layer

By reusing intermediate results, backpropagation finds every gradient in time comparable to a single forward pass, rather than recomputing from scratch for each weight.

Why the chain rule

Each layer's output depends on the previous layer's output. The chain rule multiplies local derivatives along this path, so the gradient at an early layer is a product of many terms. This is also why very deep networks can suffer from vanishing or exploding gradients.

Key idea

Backpropagation uses the chain rule in a single backward pass to compute every weight's gradient efficiently, enabling deep networks to learn.

Check yourself

Answer to earn rating on the learn ladder.

1. What mathematical rule underlies backpropagation?

2. What is the main efficiency benefit of backpropagation?