The Encoder Decoder For Translation

The encoder decoder architecture, also called sequence to sequence, was a breakthrough for machine translation. It maps a sentence in one language to a sentence in another, even when the two have different lengths.

The design has two parts. The encoder reads the source sentence one token at a time and compresses it into a representation. The decoder then generates the target sentence one token at a time, conditioned on that representation and on the words it has produced so far.

Generation is autoregressive. At each step the decoder predicts the next word, appends it, and feeds it back as input for the following step. A special end token signals when to stop.

In the original recurrent version, the encoder squeezed the whole source into a single fixed context vector. This created a bottleneck, since a long sentence had to fit into one vector, and quality dropped on long inputs.

The encoder handles understanding the source
The decoder handles producing fluent target text
A shared training objective ties them together

This bottleneck problem motivated the next big idea, attention, which lets the decoder look back at all encoder states instead of a single summary.

Key idea

The encoder decoder compresses a source sentence and generates the target word by word, but a single fixed context vector bottlenecks long inputs.

The Encoder Decoder For Translation

The Encoder Decoder For Translation

Key idea

Check yourself