The Wasserstein GAN

Replace the GAN loss with earth mover distance for smoother, more stable training.

The Wasserstein GAN

The Wasserstein GAN, or WGAN, changes what the GAN measures. Instead of a classifier probability it uses the earth mover distance between distributions, which gives gentler gradients.

The distance idea

The Wasserstein distance measures the least cost to move mass from one distribution to match another.
Unlike the original GAN loss, it stays meaningful even when the real and fake distributions barely overlap.
This means the generator gets a useful gradient early in training rather than a flat or vanishing one.

What changes in practice

The discriminator becomes a critic that outputs a real number, not a probability.
The critic must be Lipschitz constrained, enforced originally by weight clipping and later by a gradient penalty.
The loss correlates with sample quality, so it acts as a real training signal you can watch.

Why it helps

Training is more stable and less prone to mode collapse.
You can train the critic to optimality without the generator gradients vanishing.

Key idea

A WGAN replaces the classifier loss with the Wasserstein distance and uses a Lipschitz constrained critic, giving smoother gradients, more stable training, and a loss that tracks sample quality.

The Wasserstein GAN

The Wasserstein GAN

The distance idea

What changes in practice

Why it helps

Key idea

Check yourself