Two stages, shared backbone
Faster RCNN is a two stage detector. A first stage proposes candidate regions, and a second stage classifies and refines them. The key advance was sharing one convolutional backbone between both stages.
The region proposal network
The region proposal network slides over the backbone feature map and, using anchors, outputs:
- An objectness score saying how likely each anchor contains any object.
- Box offsets refining the anchor.
It produces a manageable set of high quality proposals instead of scanning exhaustively.
Aligning the proposals
Proposed regions vary in size, so a pooling step extracts a fixed size feature for each. ROI pooling, later improved as ROI align, crops and resizes the shared features so the classifier head sees a uniform input.
The second stage
The head then assigns a class and a final box refinement per proposal. Because both stages share features, the proposal step is nearly free in compute, which gave the method its name.
Key idea
Faster RCNN shares a backbone between a region proposal network and a classification head, using ROI pooling to give fixed size features per proposal, making accurate two stage detection efficient.