Pooling Layers Revisited

A pooling layer shrinks a feature map by summarizing small regions into single values. It reduces resolution without any learned weights, making the network cheaper and more robust.

Max and average pooling

The two common kinds differ in how they summarize a window.

Max pooling takes the largest value in each window, keeping the strongest activation.
Average pooling takes the mean, giving a smoother summary.

A typical setup uses a two by two window with stride two, which halves the height and width while keeping the channel count unchanged.

Why pool

Pooling gives a small amount of translation invariance: if a feature shifts by a pixel, the pooled output often stays the same. It also enlarges the area each later neuron sees and cuts computation.

Global pooling

Global average pooling collapses each entire feature map to one number, producing a vector with one value per channel. Many modern networks use it instead of large fully connected layers to feed a classifier with far fewer parameters.

Key idea

Pooling downsamples feature maps by taking the max or mean of regions, adding small shift invariance and cutting cost without learned weights.

Pooling Layers Revisited