The Convolution Operation

The convolution is the core operation of a convolutional network. It slides a small grid of weights, called a kernel, across the image and computes a weighted sum at each position.

How it works

At each location the kernel sits over a patch of the image. We multiply each kernel weight by the pixel beneath it and add the results into a single number. Sliding the kernel across every position produces a new grid of outputs.

The kernel is small, often three by three or five by five.
The same weights are reused at every position, which is called weight sharing.
The output is large where the patch matches the pattern the kernel encodes.

Why it is powerful

Because the kernel is reused everywhere, the network learns a pattern once and can find it anywhere in the image. This translation equivariance means an edge detector works in the corner just as well as in the center.

Weight sharing also keeps the number of parameters tiny compared to connecting every pixel to every output, which would be far too many weights for a real image.

Key idea

Convolution slides a shared small kernel over the image, computing weighted sums that detect local patterns anywhere they appear.

The Convolution Operation