What it is
A pooling layer reduces the size of a feature map by summarizing small regions into a single value. It usually follows convolution layers in a vision network.
Common kinds
The two common forms summarize a small window such as two by two.
- Max pooling keeps the largest value in each window, which keeps the strongest activation
- Average pooling keeps the mean of the window, which smooths the signal
A pooling window with a stride of two halves the height and width, so the feature map gets smaller while keeping the channel count.
Why it helps
Pooling has several effects.
- It lowers the spatial size, so later layers do less work
- It gives some invariance to small shifts, since a pattern that moves a little still lands in the same pooled cell
- It widens the area that deeper neurons can see, called the receptive field
Many modern networks also use global average pooling at the end, which collapses each channel to one number before the classifier.
Key idea
Pooling shrinks feature maps by summarizing regions, cutting compute and adding tolerance to small shifts.