Two stages
An encoder decoder architecture maps a variable length input to a variable length output using two cooperating networks.
- The encoder reads the full input and compresses it into a representation.
- The decoder generates the output one token at a time from that representation.
This pattern powers machine translation, summarization, and speech to text.
The bottleneck
In early sequence to sequence models the encoder squeezed everything into a single fixed length vector. Long inputs overflowed this bottleneck, hurting quality. Attention later let the decoder look back at all encoder states.
The decoder is usually autoregressive: each generated token feeds back as input for the next step.
Key idea
Encoder decoder splits sequence transduction into reading and writing, where a fixed bottleneck once limited quality until attention removed it.