← Lessons

quiz vs the machine

Gold1410

Machine Learning

The Semantic Segmentation UNet

Encoding then decoding with skip links to label every pixel.

5 min read · core · beat Gold to climb

A label per pixel

Semantic segmentation assigns a class to every pixel rather than one label per image. This needs an output the same size as the input, which a plain classifier cannot give.

The encoder decoder shape

UNet has a symmetric design:

  • The encoder downsamples, building rich but coarse features.
  • The decoder upsamples back to full resolution.

The shape looks like the letter U, which gives the name.

The skip connections

Downsampling loses precise location. UNet fixes this by passing skip connections from each encoder level to the matching decoder level. The decoder then combines coarse semantics with the sharp spatial detail saved before pooling, producing crisp boundaries.

Why it works well on few images

UNet was designed for medical images where labeled data is scarce. The skip links and heavy augmentation let it learn precise masks from small datasets, which is why it remains a default for dense prediction.

Key idea

UNet encodes an image to coarse semantic features then decodes back to full resolution, using skip connections to restore spatial detail and produce crisp per pixel masks even from small datasets.

Check yourself

Answer to earn rating on the learn ladder.

1. What do UNet skip connections restore?

2. Why does semantic segmentation need a full size output?