← Lessons

quiz vs the machine

Gold1330

Machine Learning

The Embedding And Unembedding

How tokens turn into vectors and vectors turn back into tokens.

4 min read · core · beat Gold to climb

Both ends of the model

A transformer works on continuous vectors, but text is discrete tokens. The embedding maps each input token to a vector, and the unembedding maps the final vector back to scores over the vocabulary.

The embedding lookup

  • The vocabulary is a list of token ids.
  • An embedding matrix has one row per token.
  • Looking up a token id selects its row, the token vector.

The unembedding projection

At the output, the model has a vector per position. The unembedding multiplies it by a matrix with one column per token, producing a logit for every vocabulary entry. Softmax then turns logits into a probability distribution over the next token.

A symmetric pair

The embedding turns ids into vectors at the bottom and the unembedding turns vectors into id scores at the top. Everything between them, attention and feed forward, operates purely in vector space.

Key idea

The embedding maps discrete token ids to vectors at the input, and the unembedding projects final vectors to a logit per vocabulary token at the output, bracketing a model that works entirely in continuous vector space.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the embedding matrix do?

2. What does the unembedding produce before softmax?