The Wasserstein GAN
The Wasserstein GAN, or WGAN, changes what the GAN measures. Instead of a classifier probability it uses the earth mover distance between distributions, which gives gentler gradients.
The distance idea
- The Wasserstein distance measures the least cost to move mass from one distribution to match another.
- Unlike the original GAN loss, it stays meaningful even when the real and fake distributions barely overlap.
- This means the generator gets a useful gradient early in training rather than a flat or vanishing one.
What changes in practice
- The discriminator becomes a critic that outputs a real number, not a probability.
- The critic must be Lipschitz constrained, enforced originally by weight clipping and later by a gradient penalty.
- The loss correlates with sample quality, so it acts as a real training signal you can watch.
Why it helps
- Training is more stable and less prone to mode collapse.
- You can train the critic to optimality without the generator gradients vanishing.
Key idea
A WGAN replaces the classifier loss with the Wasserstein distance and uses a Lipschitz constrained critic, giving smoother gradients, more stable training, and a loss that tracks sample quality.