Semantic Segmentation

Semantic segmentation assigns a class label to every pixel in an image. Instead of one label per image or per box, the output is a full label map at image resolution.

Dense prediction

The output has the same height and width as the input, with each pixel holding a predicted class. This is called dense prediction because every location gets an answer, not just a box.

Classification gives one label for the whole image.
Detection gives a box per object.
Segmentation gives a class per pixel.

Encoder decoder design

A typical network has two halves.

The encoder downsamples with convolutions and pooling, building rich but coarse features.
The decoder upsamples those features back to full resolution to produce the label map.

Skip connections carry fine detail from early encoder layers across to the decoder, sharpening object boundaries that pooling would otherwise blur.

Semantic versus instance

Plain semantic segmentation does not separate two objects of the same class; all cars share one label. Instance segmentation goes further and gives each car its own mask. The choice depends on whether counting individual objects matters.

Key idea

Semantic segmentation labels every pixel with a class using an encoder decoder with skip connections, producing a full resolution map.

Semantic Segmentation