← Lessons

quiz vs the machine

Silver1050

Machine Learning

Word Embeddings

Turning words into dense vectors where meaning lives in geometry.

4 min read · intro · beat Silver to climb

What they are

A word embedding maps each word to a dense vector of real numbers, usually a few hundred dimensions. Unlike one hot vectors, which are huge and mostly zeros, embeddings are compact and capture meaning.

Why geometry matters

In a good embedding space, words used in similar contexts sit close together. The classic intuition is that the vector for king minus man plus woman lands near queen. Distances and directions encode relationships like gender, tense, and category.

How they are learned

Early methods such as word2vec and GloVe learn embeddings by predicting nearby words from a large text corpus.

  • Words that appear in similar contexts get similar vectors
  • Training pushes co occurring words together
  • The result is a lookup table from word to vector

Modern transformers learn embeddings as part of the model, and they become contextual, meaning the same word can get different vectors depending on the sentence around it.

Key idea

Embeddings represent words as dense vectors so that geometric closeness reflects semantic similarity.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the main advantage of embeddings over one hot vectors?

2. What does it mean for an embedding to be contextual?