← Lessons

quiz vs the machine

Gold1470

Machine Learning

The Latent Diffusion

Run diffusion in a compressed latent space to slash compute while keeping quality.

5 min read · core · beat Gold to climb

The Latent Diffusion

Latent diffusion makes high resolution image generation practical by moving the diffusion process out of pixel space and into a smaller learned latent space.

The two stage design

  • An autoencoder first compresses images into a compact latent representation and can decode them back.
  • A diffusion model then runs entirely in that latent space, adding and removing noise on small tensors instead of full images.
  • A final decoder turns the generated latent back into a full resolution image.

Why it saves so much

  • The latent is far smaller than the pixel grid, so each diffusion step is much cheaper.
  • The autoencoder strips away imperceptible detail, letting the diffusion model focus on semantic structure.
  • This is what made running powerful text to image models on a single consumer GPU feasible.

Conditioning

  • Text or other conditions are injected through cross attention layers inside the latent denoiser.
  • This is where prompts steer the generated latent before it is decoded.

Key idea

Latent diffusion compresses images with an autoencoder and runs the diffusion process in the small latent space, dramatically cutting compute while keeping high quality and enabling prompt conditioning through cross attention.

Check yourself

Answer to earn rating on the learn ladder.

1. Where does latent diffusion run the diffusion process?

2. Why does latent diffusion save compute?

3. How are prompts injected into the latent denoiser?