From symbols to vectors
A word embedding maps each word to a dense vector of real numbers, usually a few hundred dimensions. Unlike a one hot code, which is sparse and treats every word as equally distant from every other, embeddings place related words near each other.
The distributional idea
Embeddings are learned from the principle that a word is known by the company it keeps. Models like word2vec and GloVe scan large corpora and adjust vectors so that words appearing in similar contexts end up with similar vectors.
- word2vec predicts a word from its neighbors or neighbors from a word.
- GloVe factorizes a matrix of how often words co occur.
Geometry that means something
Because directions in the space capture regularities, vector arithmetic can show analogies. The classic example is that king minus man plus woman lands near queen. Distances and angles between vectors become proxies for semantic similarity.
Limits
A static embedding gives one vector per word, so it cannot separate the bank of a river from a bank that holds money. That limitation is what later contextual models fixed.
Key idea
Word embeddings turn discrete words into dense vectors learned from context, so that geometric closeness reflects meaning, though one vector per word cannot capture different senses.