Randomly switching off
Dropout is a regularization method that randomly sets a fraction of neurons to zero on each training step. Every forward pass effectively trains a slightly different thinned network, and the full network at test time behaves like an average of all of them.
- Drop rate is the probability each neuron is zeroed, often around a fifth to a half.
- Each step picks a fresh random mask.
- Test time uses all neurons with scaled outputs.
Why it fights overfitting
Without dropout, neurons can co adapt, relying on specific partners to fix their mistakes. Dropout breaks these fragile partnerships by making any neuron unreliable, forcing each one to learn features that are useful on their own. The network becomes more robust and generalizes better.
Train versus test
During training neurons are dropped at random. At test time none are dropped, but the outputs are scaled to match the expected total signal from training. Frameworks handle this scaling automatically, but it must be on for evaluation to be correct.
Key idea
Dropout randomly zeros neurons during training to break co adaptation and force independently useful features, then uses the full scaled network at test time for better generalization.