Dropout as Regularization
Dropout fights overfitting by randomly turning off neurons during training, forcing the network to spread knowledge rather than rely on fragile partnerships.
How it works
- During training each neuron is kept with some probability and otherwise set to zero.
- A fresh random mask is drawn every forward pass.
- At test time all neurons stay on, with activations scaled to match the training average.
Why it helps
Neurons can no longer assume a specific teammate will always be present, so they cannot form brittle co adaptations. Each unit must carry useful signal on its own. The effect resembles training a huge ensemble of thinned networks that share weights, and averaging them at test time.
Practical notes
Dropout rates around twenty to fifty percent are common in fully connected layers. Too much dropout starves the network and slows learning, while too little does nothing. It is less common inside convolutional layers and is often replaced by normalization in very deep modern architectures, but it remains a cheap and effective regularizer.
Key idea
Dropout randomly silences neurons during training so they cannot co adapt, acting like averaging an ensemble of thinned networks.