The Score Based Models
Score based models offer another lens on diffusion. Instead of predicting noise, they learn the score, the gradient of the log probability density, and use it to walk toward high density regions.
What the score is
- The score at a point is the direction in which data density increases fastest.
- A network called a score network estimates this gradient for noisy data at many noise levels.
- Training uses denoising score matching, which turns out to be equivalent to predicting noise in diffusion models.
Sampling by following the score
- Start from random noise.
- Repeatedly nudge the sample along the estimated score, adding a little randomness each step.
- This procedure, Langevin dynamics, drifts samples toward regions where real data is dense.
Why multiple noise levels
- At low noise the score is sharp but only accurate near real data.
- At high noise the score is smooth and guides samples from anywhere.
- Annealing from high to low noise combines broad guidance with fine detail, unifying with the diffusion view.
Key idea
Score based models learn the gradient of log density and sample by following that score with Langevin dynamics across annealed noise levels, a formulation equivalent to denoising diffusion.