← Lessons

quiz vs the machine

Gold1480

Machine Learning

Rotary Position Embeddings

The relative position scheme behind most modern language models.

5 min read · core · beat Gold to climb

Rotating queries and keys

Rotary position embeddings, or RoPE, encode position by rotating the query and key vectors by an angle that depends on their position. Pairs of dimensions are treated as points on a plane and spun by a position dependent angle.

Why rotation encodes relativity

When you take the dot product of a rotated query and a rotated key, the result depends only on the difference of their positions, not the absolute values. So attention scores naturally become a function of relative distance.

What makes it popular

  • Position enters through multiplication on queries and keys, not addition to embeddings.
  • It generalizes more gracefully to sequences longer than those seen in training.
  • Different frequency bands rotate at different rates, capturing near and far relationships.

Applied inside attention

RoPE is applied right before the dot product in each attention head, so values are left untouched. This keeps the content of a token separate from how its position influences matching.

Key idea

Rotary embeddings rotate queries and keys by a position dependent angle so their dot product depends only on relative distance, giving a clean relative position scheme that extends well to longer sequences.

Check yourself

Answer to earn rating on the learn ladder.

1. What property emerges from rotating queries and keys in RoPE?

2. Where in the attention computation is RoPE applied?