Downsampling
Convolutional networks shrink spatial resolution as they go deeper. Two common tools are pooling and stride.
- Max pooling takes the largest value in each small window.
- Average pooling takes the mean over the window.
- Stride moves the filter more than one step at a time, skipping positions.
Why downsample
- It reduces the number of activations, saving compute and memory.
- It grows the effective receptive field quickly.
- It adds a little translation invariance, since small shifts often leave the pooled value unchanged.
A stride of two roughly halves each spatial dimension, much like a pooling window of two.
Key idea
Pooling and strided convolution both downsample feature maps, trading spatial detail for efficiency, larger receptive fields, and modest shift invariance.