← Lessons

quiz vs the machine

Gold1380

Machine Learning

The Anchor Boxes

Using preset box templates so detectors predict offsets, not raw boxes.

5 min read · core · beat Gold to climb

A reference for boxes

Predicting object boxes from scratch is hard because position and size vary wildly. Anchor boxes give the detector a set of reference rectangles at every location, each with a fixed scale and aspect ratio.

Predict the adjustment

Instead of raw coordinates the network predicts a small offset from each anchor, how much to shift the center and rescale the width and height. Learning a correction is far easier than learning the absolute box.

  • Several anchors per location cover different shapes.
  • Each anchor also gets an objectness score.

Matching during training

Each ground truth box is assigned to anchors with high overlap, measured by intersection over union. Matched anchors learn the offset to the true box, unmatched ones learn background. Anchors in between are ignored to avoid noisy targets.

Choosing anchors

Scales and aspect ratios should reflect the dataset. Detecting people wants tall anchors, while detecting cars wants wide ones. Some methods cluster training boxes to pick good anchor shapes.

Key idea

Anchor boxes are preset reference rectangles so the detector predicts small offsets and an objectness score, with ground truth assigned by IoU and anchor shapes chosen to match the data.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the network predict relative to an anchor?

2. How are ground truth boxes assigned to anchors?