← Lessons

quiz vs the machine

Gold1380

Machine Learning

The Embedding Layers

Lookup tables turn discrete tokens into learnable dense vectors.

4 min read · core · beat Gold to climb

From symbols to vectors

An embedding layer maps each discrete token, such as a word or category, to a dense learnable vector.

  • It is implemented as a lookup table indexed by token id.
  • Each row is a vector trained by gradient descent like any other parameter.
  • Similar tokens tend to end up with similar vectors.

Why not one hot

A one hot vector is huge and sparse, and it treats every token as equally distant. Embeddings are compact, dense, and place related tokens near each other in space.

In transformers the token embeddings are combined with positional encodings before entering the first block. Output embeddings are sometimes tied to the input table to save parameters.

Key idea

Embedding layers are trainable lookup tables that convert discrete tokens into dense vectors, giving the model a compact space where similar tokens sit close together.

Check yourself

Answer to earn rating on the learn ladder.

1. How is an embedding layer implemented?

2. Why prefer embeddings over one hot vectors?