Object Detection Basics

Object detection goes beyond classification by locating objects as well as naming them. The output is a set of bounding boxes, each with a class label and a confidence score.

Boxes and labels

A bounding box is a rectangle described by its position and size. Each detection answers two questions at once:

What is the object, given by a class label.
Where is it, given by the box coordinates.

Two main styles

Two stage detectors first propose candidate regions, then classify and refine each one. They tend to be accurate but slower.
One stage detectors predict boxes and classes directly over a grid in a single pass, trading a little accuracy for speed.

Anchors and grids

Many detectors place a set of anchor boxes of various shapes across the image and predict adjustments to each. The network learns which anchors contain objects and how to nudge them to fit.

Detection produces many overlapping boxes, so a cleanup step is needed to keep only the best one per object. The quality of each box is judged by how well it overlaps the true box.

Key idea

Object detection outputs bounding boxes with class labels and scores, using one or two stage designs and anchors to find what is where.

Object Detection Basics