t SNE for Visualization
t SNE, short for t distributed stochastic neighbor embedding, is a nonlinear technique for mapping high dimensional data down to two or three dimensions so you can see its structure.
Preserving neighborhoods
t SNE focuses on keeping local relationships. It converts distances into probabilities that two points are neighbors, both in the original space and in the low dimensional map, then arranges the map so those probabilities match.
- Nearby points in the data stay nearby in the plot.
- The heavy tailed t distribution in the map prevents distant points from crowding together.
- The result often shows clear visual clusters.
The perplexity knob
The main hyperparameter is perplexity, which roughly sets how many neighbors each point considers. Small values emphasize tight local structure, while larger values capture broader groupings.
Reading the output safely
t SNE is a powerful visualizer but its plots are easy to misread.
- Cluster sizes in the plot do not reflect real density.
- Distances between clusters are not meaningful.
- Running it twice can give different layouts because it is stochastic.
Use it to spot structure, not to measure it.
Key idea
t SNE embeds data into low dimensions by matching neighbor probabilities, revealing local structure that you should read qualitatively, not metrically.