← Lessons

quiz vs the machine

Gold1410

Machine Learning

The Residual Connections

Skip paths let gradients flow and make very deep nets trainable.

4 min read · core · beat Gold to climb

The skip path

A residual connection adds a layer input directly to its output, so the layer only has to learn a residual correction rather than the full mapping.

  • The output is the input plus the transformed input.
  • This creates a direct path for the signal to skip the layer.

Why it works

  • Gradients flow through the addition almost unchanged, easing vanishing gradients.
  • A layer can default to the identity simply by outputting near zero, so extra depth never hurts.
  • This made networks with hundreds of layers trainable.

Residuals are core to deep convolutional nets and to every transformer block, where they wrap both attention and feedforward sublayers.

Key idea

Residual connections add the input back to the output, giving gradients a clean path and letting layers learn small corrections so very deep networks stay trainable.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a residual connection compute?

2. Why do residuals help train deep networks?