← Lessons

quiz vs the machine

Platinum1820

Machine Learning

The Wasserstein GAN

Replace the GAN loss with earth mover distance for smoother, more stable training.

5 min read · advanced · beat Platinum to climb

The Wasserstein GAN

The Wasserstein GAN, or WGAN, changes what the GAN measures. Instead of a classifier probability it uses the earth mover distance between distributions, which gives gentler gradients.

The distance idea

  • The Wasserstein distance measures the least cost to move mass from one distribution to match another.
  • Unlike the original GAN loss, it stays meaningful even when the real and fake distributions barely overlap.
  • This means the generator gets a useful gradient early in training rather than a flat or vanishing one.

What changes in practice

  • The discriminator becomes a critic that outputs a real number, not a probability.
  • The critic must be Lipschitz constrained, enforced originally by weight clipping and later by a gradient penalty.
  • The loss correlates with sample quality, so it acts as a real training signal you can watch.

Why it helps

  • Training is more stable and less prone to mode collapse.
  • You can train the critic to optimality without the generator gradients vanishing.

Key idea

A WGAN replaces the classifier loss with the Wasserstein distance and uses a Lipschitz constrained critic, giving smoother gradients, more stable training, and a loss that tracks sample quality.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a WGAN critic output?

2. Why is the Wasserstein distance helpful for training?

3. What constraint must the critic satisfy?