Dropout Regularization

Randomly switching off

Dropout is a regularization method that randomly sets a fraction of neurons to zero on each training step. Every forward pass effectively trains a slightly different thinned network, and the full network at test time behaves like an average of all of them.

Drop rate is the probability each neuron is zeroed, often around a fifth to a half.
Each step picks a fresh random mask.
Test time uses all neurons with scaled outputs.

Why it fights overfitting

Without dropout, neurons can co adapt, relying on specific partners to fix their mistakes. Dropout breaks these fragile partnerships by making any neuron unreliable, forcing each one to learn features that are useful on their own. The network becomes more robust and generalizes better.

Train versus test

During training neurons are dropped at random. At test time none are dropped, but the outputs are scaled to match the expected total signal from training. Frameworks handle this scaling automatically, but it must be on for evaluation to be correct.

Key idea

Dropout randomly zeros neurons during training to break co adaptation and force independently useful features, then uses the full scaled network at test time for better generalization.

Dropout Regularization

Randomly switching off

Why it fights overfitting

Train versus test

Key idea

Check yourself