Word2vec Skip Gram

Word2vec learns dense vectors, called embeddings, where words with similar meaning sit close together. The skip gram variant trains by a deceptively simple task: given a center word, predict the words around it.

You slide a context window across text. For each center word the model tries to raise the probability of its true neighbors and lower the probability of random words. As training proceeds, words that share neighbors drift toward the same region of space.

This is the distributional hypothesis in action. Words that occur in similar contexts tend to have similar meanings, so the network learns geometry that reflects semantics.

Two ideas make it efficient:

Negative sampling, which replaces an expensive full vocabulary softmax with a few random non neighbors to push away
A modest window size, which balances syntax against topic

The payoff is striking geometry. Vector arithmetic captures relationships, so king minus man plus woman lands near queen. Embeddings became a reusable foundation, feeding classifiers, search, and later neural networks.

The catch is that each word gets one vector regardless of sense, so bank by a river and bank with money collapse together. Contextual models later solved this, but skip gram set the template.

Key idea

Skip gram learns word embeddings by predicting neighboring words, placing words with similar contexts near each other in vector space.

Word2vec Skip Gram

Word2vec Skip Gram

Key idea

Check yourself