The Feature Pyramid Network

The scale problem

Objects appear at many sizes. Deep features are semantically rich but spatially coarse, while shallow features are sharp but weak in meaning. Neither alone handles both small and large objects well.

The pyramid idea

A feature pyramid network builds a multi scale set of feature maps that are all semantically strong. It takes the bottom up backbone features and adds a top down path that carries high level meaning back down to higher resolutions.

The bottom up path is the normal backbone, producing coarser, deeper maps.
The top down path upsamples deep features step by step.
Lateral connections add matching backbone features at each level.

Why fusion helps

Each pyramid level now combines strong semantics with appropriate resolution. Small objects are detected on fine levels and large objects on coarse levels, all sharing rich features.

How detectors use it

The detector head runs on every pyramid level. This single shared design replaced earlier image pyramids that reprocessed the whole image at many scales, saving large amounts of compute.

Key idea

A feature pyramid network adds a top down path with lateral connections so every scale has strong semantics and proper resolution, letting one detector handle objects of all sizes efficiently.

The Feature Pyramid Network

The scale problem

The pyramid idea

Why fusion helps

How detectors use it

Key idea

Check yourself