Classifier Free Guidance
Classifier free guidance is the technique that lets diffusion models follow a text prompt strongly without a separate classifier. It steers generation by blending two predictions.
The two predictions
- During training the model randomly drops the condition sometimes, so it learns both a conditional and an unconditional denoiser in one network.
- At sampling time it runs both, predicting noise with the prompt and without it.
- The difference between them points toward the prompt.
The guidance scale
- Form the guided prediction as the unconditional prediction pushed in the conditional direction.
- A guidance scale controls how hard to push. Scale one means no guidance.
- Higher scales make output match the prompt more closely but can reduce diversity and over saturate.
Why it matters
- It removes the need to train and differentiate a noisy classifier, which was fragile.
- It is the main knob behind prompt adherence in modern text to image systems.
- Tuning the scale trades fidelity to the prompt against sample variety.
Key idea
Classifier free guidance trains one network to denoise with and without the condition, then amplifies their difference by a guidance scale to steer samples toward the prompt, trading diversity for fidelity.