The idea
A convolutional layer applies a small learnable filter that slides across the input grid, computing a dot product at each position to build a feature map.
- The same filter weights are reused at every position, called weight sharing.
- This gives translation equivariance: a pattern is detected wherever it appears.
- Many filters run in parallel to form output channels.
Why it helps
Compared to a fully connected layer, convolution uses far fewer parameters because each output looks only at a local receptive field and reuses weights.
Stacking layers grows the receptive field, so deeper units respond to larger and more abstract structures like edges then shapes then objects.
Key idea
Convolution exploits locality and weight sharing so the same pattern detector is applied everywhere, slashing parameters while preserving spatial structure.