← Lessons

quiz vs the machine

Gold1410

Machine Learning

Encoder Only Versus Decoder Only Versus Encoder Decoder

Three transformer shapes and the tasks each one fits.

5 min read · core · beat Gold to climb

Three blueprints

The same transformer block assembles into three major families, each suited to different tasks.

Encoder only

  • Every token attends to all other tokens, both left and right.
  • Produces rich contextual representations of an input.
  • Best for understanding tasks like classification and retrieval.

Decoder only

  • Uses a causal mask so each token sees only the past.
  • Generates text one token at a time.
  • Best for generation, and the basis of most large language models today.

Encoder decoder

  • An encoder reads the input bidirectionally, a decoder generates output with cross attention into the encoder.
  • Best for sequence to sequence tasks like translation and summarization.

Choosing one

If you only need to read and label, use an encoder. If you need to write, use a decoder. If you transform one sequence into another, the encoder decoder pairing gives you both a full view of the input and a causal generator.

Key idea

Encoder only models read bidirectionally for understanding, decoder only models generate causally, and encoder decoder models pair a bidirectional reader with a causal writer for sequence to sequence work.

Check yourself

Answer to earn rating on the learn ladder.

1. Which family is the basis of most large language models today?

2. Why does an encoder only model suit understanding tasks?

3. What connects the decoder to the encoder in an encoder decoder model?