← Lessons

quiz vs the machine

Platinum1720

Machine Learning

Model Pruning

Remove weights or whole structures to shrink a trained network with little accuracy loss.

5 min read · advanced · beat Platinum to climb

The idea

Trained networks are usually over parameterized. Many weights contribute little. Pruning removes those low importance weights to make the model smaller and often faster, while trying to keep accuracy.

Unstructured versus structured

  • Unstructured pruning zeroes out individual weights, typically the smallest in magnitude. It can remove a large fraction of weights, but the result is a sparse matrix that ordinary hardware does not speed up without special support.
  • Structured pruning removes whole units like channels, filters, or attention heads. It yields a genuinely smaller dense model that runs faster on standard hardware, at the cost of being coarser.

The recipe

A common loop is train, prune, then fine tune to recover the accuracy lost when weights were removed. Repeating this gradually, called iterative pruning, reaches higher sparsity than removing everything at once.

The lottery ticket observation is that a pruned subnetwork, when reset to its original initial weights, can sometimes train to full accuracy on its own.

Key idea

Pruning removes low importance weights or structures from a trained model; unstructured pruning gives high sparsity that needs special support, while structured pruning yields a smaller dense model that runs faster everywhere.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does unstructured pruning often fail to speed up inference on normal hardware?

2. What does structured pruning remove?

3. Why is fine tuning part of the pruning recipe?