Model Pruning

Remove weights or whole structures to shrink a trained network with little accuracy loss.

The idea

Trained networks are usually over parameterized. Many weights contribute little. Pruning removes those low importance weights to make the model smaller and often faster, while trying to keep accuracy.

Unstructured versus structured

Unstructured pruning zeroes out individual weights, typically the smallest in magnitude. It can remove a large fraction of weights, but the result is a sparse matrix that ordinary hardware does not speed up without special support.
Structured pruning removes whole units like channels, filters, or attention heads. It yields a genuinely smaller dense model that runs faster on standard hardware, at the cost of being coarser.

The recipe

A common loop is train, prune, then fine tune to recover the accuracy lost when weights were removed. Repeating this gradually, called iterative pruning, reaches higher sparsity than removing everything at once.

The lottery ticket observation is that a pruned subnetwork, when reset to its original initial weights, can sometimes train to full accuracy on its own.

Key idea

Pruning removes low importance weights or structures from a trained model; unstructured pruning gives high sparsity that needs special support, while structured pruning yields a smaller dense model that runs faster everywhere.

The idea

Unstructured versus structured

The recipe

Key idea

Check yourself