What it is
A diffusion model generates data by learning to reverse a slow corruption process. It first defines how to destroy data with noise, then trains a network to undo that noise one small step at a time.
Forward and reverse
There are two processes that mirror each other.
- The forward process adds a little Gaussian noise to a sample over many steps until it becomes pure noise
- The reverse process is a learned network that removes a little noise at each step, walking back from noise toward clean data
The network is trained to predict the noise that was added at a given step. Once trained, you start from random noise and run the reverse steps to produce a fresh sample.
Why it matters
Diffusion models power many modern image and audio generators.
- They produce high quality and diverse samples and avoid the mode collapse seen in adversarial training
- Generation is slower because it needs many reverse steps, though faster samplers reduce the count
- They can be conditioned on text prompts to guide what gets generated
Key idea
A diffusion model learns to reverse a step by step noising process, turning random noise into realistic new data.