← Lessons

quiz vs the machine

Platinum1740

Machine Learning

Embeddings For Categorical Features

Learn dense vectors that capture how categories relate.

6 min read · advanced · beat Platinum to climb

Beyond one hot

One hot encoding treats every category as equally distant from every other. A learned embedding instead maps each category to a short dense vector, placing similar categories near each other in that space.

How it learns

An embedding is a lookup table of vectors, one row per category, trained alongside the rest of the network.

  • The vector for each category starts random.
  • Gradients flowing back from the loss nudge the vectors so they help prediction.
  • Categories used in similar contexts drift toward similar vectors.

Why it helps

  • It captures relationships, so two similar products end up close together.
  • It keeps dimensions small even for millions of categories.
  • The learned vectors can be reused in other models or for similarity search.

Practical choices

  • A common rule sets the vector size near the cube root or a small fraction of the category count.
  • Reserve a slot for unknown categories seen only at prediction time.
  • Embeddings shine when there are many categories and plenty of training data.

Key idea

Categorical embeddings learn dense vectors that place similar categories near each other, scaling to huge category counts while capturing relationships one hot encoding ignores.

Check yourself

Answer to earn rating on the learn ladder.

1. What advantage do embeddings have over one hot encoding?

2. How are embedding vectors learned?