What it is
A multilayer perceptron (MLP) is a stack of fully connected layers. Each layer computes a linear transform followed by a nonlinear activation.
- Each unit takes a weighted sum of its inputs plus a bias.
- An activation like ReLU or sigmoid is applied elementwise.
- Layers between input and output are called hidden layers.
Why nonlinearity matters
Without activations, stacking linear layers collapses into a single linear map. The nonlinearity is what lets an MLP bend decision boundaries and act as a universal approximator given enough width.
Forward pass
For one layer the output is a matrix multiply of weights by the input vector, plus the bias, then the activation. Data flows forward layer by layer.
Training adjusts weights by backpropagation, which propagates the loss gradient backward through the layers.
Key idea
An MLP interleaves linear maps with nonlinear activations so that stacking layers builds genuinely richer functions instead of collapsing into one linear map.