The Convolution Arithmetic

Why the numbers matter

A convolution slides a small kernel across an image and writes one output per position. Getting the output size wrong breaks the next layer, so the arithmetic must be exact before you debug anything else.

The core formula

For one spatial dimension the output length is the floor of the input length plus twice the padding minus the kernel size, all divided by the stride, plus one.

Kernel size sets how many input pixels each output sees.
Stride is the step between positions, so a stride of two roughly halves the size.
Padding adds border pixels so edges are not lost.

Same and valid

Two named modes appear often:

Valid padding adds nothing, so the output shrinks.
Same padding chooses padding so output size equals input size when stride is one.

A worked case

An input of size thirty two with a three by three kernel, padding one, and stride one gives an output of thirty two. The same input with stride two gives sixteen. Tracing these by hand catches shape bugs early.

Key idea