The Object Detection YOLO

Detection in one shot

YOLO, you only look once, treats detection as a single regression over the whole image. One forward pass outputs all boxes and classes, which makes it fast enough for real time use.

The grid view

The image is divided into a grid of cells. Each cell is responsible for objects whose center falls inside it and predicts:

A set of boxes with positions and sizes.
An objectness score per box.
Class probabilities for the cell.

This single stage design contrasts with two stage methods that first propose regions.

Why it is fast

Because the network produces everything at once, there is no separate proposal pass and no repeated cropping. The whole image is processed in one go, so speed stays high.

The trade

Early versions struggled with small or clustered objects because each cell predicts a limited number of boxes. Later versions added anchors, multi scale features, and finer grids to close much of the accuracy gap with two stage detectors.

Key idea