The Semantic Segmentation UNet

A label per pixel

Semantic segmentation assigns a class to every pixel rather than one label per image. This needs an output the same size as the input, which a plain classifier cannot give.

The encoder decoder shape

UNet has a symmetric design:

The encoder downsamples, building rich but coarse features.
The decoder upsamples back to full resolution.

The shape looks like the letter U, which gives the name.

The skip connections

Downsampling loses precise location. UNet fixes this by passing skip connections from each encoder level to the matching decoder level. The decoder then combines coarse semantics with the sharp spatial detail saved before pooling, producing crisp boundaries.

Why it works well on few images

UNet was designed for medical images where labeled data is scarce. The skip links and heavy augmentation let it learn precise masks from small datasets, which is why it remains a default for dense prediction.

Key idea