The Encoder Decoder

Two stages

An encoder decoder architecture maps a variable length input to a variable length output using two cooperating networks.

The encoder reads the full input and compresses it into a representation.
The decoder generates the output one token at a time from that representation.

This pattern powers machine translation, summarization, and speech to text.

The bottleneck

In early sequence to sequence models the encoder squeezed everything into a single fixed length vector. Long inputs overflowed this bottleneck, hurting quality. Attention later let the decoder look back at all encoder states.

The decoder is usually autoregressive: each generated token feeds back as input for the next step.

Key idea

Encoder decoder splits sequence transduction into reading and writing, where a fixed bottleneck once limited quality until attention removed it.

The Encoder Decoder

Two stages

The bottleneck

Key idea

Check yourself