What it is
The encoder decoder architecture is a general pattern for transforming one structured input into a structured output. An encoder builds a representation of the input, and a decoder produces the output from that representation.
The two roles
The split divides the work cleanly.
- The encoder focuses on understanding, mapping the input to a rich internal representation
- The decoder focuses on generation, producing the output token by token or pixel by pixel
- A connecting representation, often plus attention, links the two
Transformers come in three flavors built on these parts. Encoder only models like classifiers focus on understanding, decoder only models like many chat systems focus on generation, and full encoder decoder models handle transformation tasks.
Where it appears
The pattern is everywhere in modern systems.
- Translation and summarization use a full encoder decoder
- Image segmentation networks encode an image then decode a pixel map
- Speech systems encode audio and decode text
Key idea
The encoder decoder architecture separates understanding the input from generating the output through a shared representation.