Image Representation and Channels
To a computer an image is just numbers. Each picture is a grid of pixels, and each pixel holds one or more intensity values. Understanding this layout is the first step in computer vision.
Width height and channels
A color image is usually stored as a three dimensional array with shape height by width by channels. The first two dimensions locate a pixel, and the third holds its color.
- A grayscale image has one channel, a single brightness value per pixel.
- A standard color image has three channels for red, green, and blue.
- Some images add a fourth alpha channel for transparency.
Pixel values
Each value typically ranges from 0 to 255 when stored as bytes. Zero means dark and 255 means full intensity for that channel. Networks often rescale these to a 0 to 1 range or normalize them so training is more stable.
Why channels matter
A convolutional network treats channels as separate feature planes. The first layer sees three color planes, and deeper layers build up many channels that each represent a learned feature rather than a raw color.
Key idea
An image is a height by width by channels tensor of pixel intensities, and color is encoded as separate channel planes.