GloVe Embeddings
GloVe, short for global vectors, learns word embeddings from co occurrence counts gathered across the entire corpus. Where skip gram slides a window and predicts local neighbors, GloVe first builds a giant table of how often each pair of words appears together, then fits vectors to that table.
The central insight is about ratios. The ratio of co occurrence probabilities carries meaning. Ice co occurs with solid far more than steam does, while steam co occurs with gas more than ice does. GloVe trains vectors so their dot products match the logarithm of these co occurrence counts, capturing those ratios directly.
This blends two traditions:
- Count based methods that summarize global statistics in one pass
- Prediction based methods like word2vec that learn by local objectives
A weighting function keeps very frequent pairs from dominating and rare pairs from adding noise. The result is embeddings with the same useful geometry, where analogies appear as vector offsets.
In practice GloVe and skip gram perform similarly. GloVe can be appealing because it uses corpus wide statistics efficiently and trains on a compact co occurrence matrix rather than streaming text repeatedly. Like word2vec, it still assigns one static vector per word.
Key idea
GloVe fits word vectors so their dot products match log co occurrence counts, learning embeddings from global corpus statistics.