Pooling Layers Revisited
A pooling layer shrinks a feature map by summarizing small regions into single values. It reduces resolution without any learned weights, making the network cheaper and more robust.
Max and average pooling
The two common kinds differ in how they summarize a window.
- Max pooling takes the largest value in each window, keeping the strongest activation.
- Average pooling takes the mean, giving a smoother summary.
A typical setup uses a two by two window with stride two, which halves the height and width while keeping the channel count unchanged.
Why pool
Pooling gives a small amount of translation invariance: if a feature shifts by a pixel, the pooled output often stays the same. It also enlarges the area each later neuron sees and cuts computation.
Global pooling
Global average pooling collapses each entire feature map to one number, producing a vector with one value per channel. Many modern networks use it instead of large fully connected layers to feed a classifier with far fewer parameters.
Key idea
Pooling downsamples feature maps by taking the max or mean of regions, adding small shift invariance and cutting cost without learned weights.