← Lessons

quiz vs the machine

Gold1400

Machine Learning

Positional Encoding

Giving order back to a model that sees tokens as an unordered set.

4 min read · core · beat Gold to climb

The missing order

Self attention treats its input as a bag of tokens. By itself it has no sense of which token comes first. Yet word order carries meaning, since the dog bit the man differs from the man bit the dog.

Adding position information

Positional encoding injects order by adding a position dependent vector to each token embedding.

  • The original transformer used fixed sinusoidal patterns of different frequencies
  • Each position gets a unique signature the model can read
  • Newer models often learn position vectors directly or use rotary schemes that rotate query and key vectors

How the model uses it

Because positions are added to embeddings, attention scores can depend on both content and location. The model can learn rules like attend to the previous token or focus on the start of the sentence. Relative schemes are popular because they help models handle sequences longer than those seen in training.

Key idea

Positional encoding adds order information to token embeddings so attention can use both what a token is and where it sits.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does a transformer need positional encoding?

2. What kind of pattern did the original transformer use for positions?