← Lessons

quiz vs the machine

Gold1470

Machine Learning

The Object Detection Faster RCNN

Sharing features between a region proposer and a box classifier.

5 min read · core · beat Gold to climb

Two stages, shared backbone

Faster RCNN is a two stage detector. A first stage proposes candidate regions, and a second stage classifies and refines them. The key advance was sharing one convolutional backbone between both stages.

The region proposal network

The region proposal network slides over the backbone feature map and, using anchors, outputs:

  • An objectness score saying how likely each anchor contains any object.
  • Box offsets refining the anchor.

It produces a manageable set of high quality proposals instead of scanning exhaustively.

Aligning the proposals

Proposed regions vary in size, so a pooling step extracts a fixed size feature for each. ROI pooling, later improved as ROI align, crops and resizes the shared features so the classifier head sees a uniform input.

The second stage

The head then assigns a class and a final box refinement per proposal. Because both stages share features, the proposal step is nearly free in compute, which gave the method its name.

Key idea

Faster RCNN shares a backbone between a region proposal network and a classification head, using ROI pooling to give fixed size features per proposal, making accurate two stage detection efficient.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the region proposal network output?

2. Why is ROI pooling needed?

3. What makes the proposal stage cheap?