← Lessons

quiz vs the machine

Platinum1740

Machine Learning

The Embedding Lookup

How integer token ids become the dense vectors the network processes.

4 min read · advanced · beat Platinum to climb

From id to vector

A token id is just an index. The first thing the model does is look that index up in an embedding matrix, retrieving a learned dense vector for the token.

The embedding matrix

The matrix has one row per vocabulary entry and one column per model dimension. With a vocabulary of fifty thousand and a width of one thousand, that is fifty million parameters before any other layer exists.

  • Each row starts random and is learned during training.
  • The lookup is a simple row selection, not a matrix multiply.
  • Similar tokens drift toward similar vectors as training proceeds.

Tied weights

Many models reuse the same matrix for the output projection that turns final hidden states back into token scores. This weight tying saves parameters and often helps quality.

Why it matters for tokenization

Rare tokens get few gradient updates, so their embeddings stay weak. This is a direct link between vocabulary choices and how well individual tokens are represented.

Key idea

Each token id selects a learned row from the embedding matrix, turning integers into dense vectors, and rare tokens get poorly trained rows.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the embedding lookup?

2. What does weight tying refer to here?