← Lessons

quiz vs the machine

Gold1430

Machine Learning

The Pruning and Sparsity

Removing weights to make models smaller and sometimes faster.

5 min read · core · beat Gold to climb

Most weights are redundant

Large networks are over parameterized, so many weights contribute little. Pruning removes the least important weights, leaving a sparse model with fewer nonzero parameters.

Unstructured versus structured

  • Unstructured pruning zeros out individual weights anywhere. It reaches high sparsity but the scattered zeros are hard for hardware to exploit.
  • Structured pruning removes whole channels, filters, or heads. It gives real speedups because the remaining computation is still dense and regular.

The trade off is flexibility versus hardware friendliness.

A typical workflow

Pruning usually alternates with fine tuning to recover lost accuracy.

Realizing the speedup

Sparsity only saves time if the hardware can skip the zeros. Some GPUs support structured sparsity patterns, such as two nonzeros in every group of four, that the tensor cores accelerate directly. Without such support, unstructured sparsity mainly saves storage rather than compute.

Key idea

Pruning removes unimportant weights to create sparse models, and structured patterns matched to hardware turn that sparsity into real speed, not just smaller size.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is structured pruning often more useful for speed?

2. When does unstructured sparsity fail to give a speedup?