Word Embeddings

What they are

A word embedding maps each word to a dense vector of real numbers, usually a few hundred dimensions. Unlike one hot vectors, which are huge and mostly zeros, embeddings are compact and capture meaning.

Why geometry matters

In a good embedding space, words used in similar contexts sit close together. The classic intuition is that the vector for king minus man plus woman lands near queen. Distances and directions encode relationships like gender, tense, and category.

How they are learned

Early methods such as word2vec and GloVe learn embeddings by predicting nearby words from a large text corpus.

Words that appear in similar contexts get similar vectors
Training pushes co occurring words together
The result is a lookup table from word to vector

Modern transformers learn embeddings as part of the model, and they become contextual, meaning the same word can get different vectors depending on the sentence around it.

Key idea

Embeddings represent words as dense vectors so that geometric closeness reflects semantic similarity.

What they are

Why geometry matters

How they are learned

Key idea

Check yourself