Image Representation and Channels

To a computer an image is just numbers. Each picture is a grid of pixels, and each pixel holds one or more intensity values. Understanding this layout is the first step in computer vision.

Width height and channels

A color image is usually stored as a three dimensional array with shape height by width by channels. The first two dimensions locate a pixel, and the third holds its color.

A grayscale image has one channel, a single brightness value per pixel.
A standard color image has three channels for red, green, and blue.
Some images add a fourth alpha channel for transparency.

Pixel values

Each value typically ranges from 0 to 255 when stored as bytes. Zero means dark and 255 means full intensity for that channel. Networks often rescale these to a 0 to 1 range or normalize them so training is more stable.

Why channels matter

A convolutional network treats channels as separate feature planes. The first layer sees three color planes, and deeper layers build up many channels that each represent a learned feature rather than a raw color.

Key idea

An image is a height by width by channels tensor of pixel intensities, and color is encoded as separate channel planes.

Image Representation and Channels