← Lessons

quiz vs the machine

Gold1430

Machine Learning

Sequence to Sequence Models

Models that turn one sequence into another using an encoder and a decoder.

5 min read · core · beat Gold to climb

What it is

A sequence to sequence model maps an input sequence to an output sequence that may have a different length. Translation, summarization, and speech transcription all fit this shape.

Two parts

The classic design has two networks.

  • The encoder reads the whole input and compresses it into a context representation
  • The decoder generates the output one token at a time, conditioned on that context and on the tokens it has produced so far

The decoder starts from a start token and feeds each prediction back in until it emits an end token. This step by step generation is called autoregressive decoding.

The bottleneck and attention

In the earliest version the encoder squeezed everything into one fixed vector, which became a bottleneck for long inputs.

  • Attention fixed this by letting the decoder look back at all encoder states
  • At each output step the decoder weights the input positions it cares about most
  • This made long sequence translation far more accurate and inspired the transformer

Key idea

A sequence to sequence model encodes an input into context, then a decoder generates the output token by token.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the decoder in a sequence to sequence model do?

2. What problem did attention solve in these models?