The Encoder Decoder For Translation
The encoder decoder architecture, also called sequence to sequence, was a breakthrough for machine translation. It maps a sentence in one language to a sentence in another, even when the two have different lengths.
The design has two parts. The encoder reads the source sentence one token at a time and compresses it into a representation. The decoder then generates the target sentence one token at a time, conditioned on that representation and on the words it has produced so far.
Generation is autoregressive. At each step the decoder predicts the next word, appends it, and feeds it back as input for the following step. A special end token signals when to stop.
In the original recurrent version, the encoder squeezed the whole source into a single fixed context vector. This created a bottleneck, since a long sentence had to fit into one vector, and quality dropped on long inputs.
- The encoder handles understanding the source
- The decoder handles producing fluent target text
- A shared training objective ties them together
This bottleneck problem motivated the next big idea, attention, which lets the decoder look back at all encoder states instead of a single summary.
Key idea
The encoder decoder compresses a source sentence and generates the target word by word, but a single fixed context vector bottlenecks long inputs.