← Lessons

quiz vs the machine

Gold1430

Machine Learning

The Object Detection YOLO

Predicting all boxes in one forward pass over a grid.

5 min read · core · beat Gold to climb

Detection in one shot

YOLO, you only look once, treats detection as a single regression over the whole image. One forward pass outputs all boxes and classes, which makes it fast enough for real time use.

The grid view

The image is divided into a grid of cells. Each cell is responsible for objects whose center falls inside it and predicts:

  • A set of boxes with positions and sizes.
  • An objectness score per box.
  • Class probabilities for the cell.

This single stage design contrasts with two stage methods that first propose regions.

Why it is fast

Because the network produces everything at once, there is no separate proposal pass and no repeated cropping. The whole image is processed in one go, so speed stays high.

The trade

Early versions struggled with small or clustered objects because each cell predicts a limited number of boxes. Later versions added anchors, multi scale features, and finer grids to close much of the accuracy gap with two stage detectors.

Key idea

YOLO predicts all boxes, objectness, and classes in one pass over a grid, trading some accuracy on small clustered objects for real time speed that later versions largely recovered.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is YOLO fast compared to two stage detectors?

2. Which cell is responsible for an object?