Why the numbers matter
A convolution slides a small kernel across an image and writes one output per position. Getting the output size wrong breaks the next layer, so the arithmetic must be exact before you debug anything else.
The core formula
For one spatial dimension the output length is the floor of the input length plus twice the padding minus the kernel size, all divided by the stride, plus one.
- Kernel size sets how many input pixels each output sees.
- Stride is the step between positions, so a stride of two roughly halves the size.
- Padding adds border pixels so edges are not lost.
Same and valid
Two named modes appear often:
- Valid padding adds nothing, so the output shrinks.
- Same padding chooses padding so output size equals input size when stride is one.
A worked case
An input of size thirty two with a three by three kernel, padding one, and stride one gives an output of thirty two. The same input with stride two gives sixteen. Tracing these by hand catches shape bugs early.
Key idea
Output size is the floor of input plus twice padding minus kernel over stride plus one, and same padding keeps size fixed while stride shrinks it.