← Lessons

quiz vs the machine

Silver1050

Machine Learning

The Convolution Arithmetic

Computing output size from kernel, stride, and padding so layers line up.

4 min read · intro · beat Silver to climb

Why the numbers matter

A convolution slides a small kernel across an image and writes one output per position. Getting the output size wrong breaks the next layer, so the arithmetic must be exact before you debug anything else.

The core formula

For one spatial dimension the output length is the floor of the input length plus twice the padding minus the kernel size, all divided by the stride, plus one.

  • Kernel size sets how many input pixels each output sees.
  • Stride is the step between positions, so a stride of two roughly halves the size.
  • Padding adds border pixels so edges are not lost.

Same and valid

Two named modes appear often:

  • Valid padding adds nothing, so the output shrinks.
  • Same padding chooses padding so output size equals input size when stride is one.

A worked case

An input of size thirty two with a three by three kernel, padding one, and stride one gives an output of thirty two. The same input with stride two gives sixteen. Tracing these by hand catches shape bugs early.

Key idea

Output size is the floor of input plus twice padding minus kernel over stride plus one, and same padding keeps size fixed while stride shrinks it.

Check yourself

Answer to earn rating on the learn ladder.

1. An input of 32 with a 3x3 kernel, padding 1, stride 2 gives what output size?

2. What does same padding achieve at stride one?