What they are
A word embedding maps each word to a dense vector of real numbers, usually a few hundred dimensions. Unlike one hot vectors, which are huge and mostly zeros, embeddings are compact and capture meaning.
Why geometry matters
In a good embedding space, words used in similar contexts sit close together. The classic intuition is that the vector for king minus man plus woman lands near queen. Distances and directions encode relationships like gender, tense, and category.
How they are learned
Early methods such as word2vec and GloVe learn embeddings by predicting nearby words from a large text corpus.
- Words that appear in similar contexts get similar vectors
- Training pushes co occurring words together
- The result is a lookup table from word to vector
Modern transformers learn embeddings as part of the model, and they become contextual, meaning the same word can get different vectors depending on the sentence around it.
Key idea
Embeddings represent words as dense vectors so that geometric closeness reflects semantic similarity.