Encoder Only Versus Decoder Only Versus Encoder Decoder

Three blueprints

The same transformer block assembles into three major families, each suited to different tasks.

Encoder only

Every token attends to all other tokens, both left and right.
Produces rich contextual representations of an input.
Best for understanding tasks like classification and retrieval.

Decoder only

Uses a causal mask so each token sees only the past.
Generates text one token at a time.
Best for generation, and the basis of most large language models today.

Encoder decoder

An encoder reads the input bidirectionally, a decoder generates output with cross attention into the encoder.
Best for sequence to sequence tasks like translation and summarization.

Choosing one

If you only need to read and label, use an encoder. If you need to write, use a decoder. If you transform one sequence into another, the encoder decoder pairing gives you both a full view of the input and a causal generator.

Key idea

Encoder only models read bidirectionally for understanding, decoder only models generate causally, and encoder decoder models pair a bidirectional reader with a causal writer for sequence to sequence work.