← Lessons

quiz vs the machine

Silver1050

Machine Learning

The Multilayer Perceptron

Stacked linear layers plus nonlinearity make a universal function approximator.

4 min read · intro · beat Silver to climb

What it is

A multilayer perceptron (MLP) is a stack of fully connected layers. Each layer computes a linear transform followed by a nonlinear activation.

  • Each unit takes a weighted sum of its inputs plus a bias.
  • An activation like ReLU or sigmoid is applied elementwise.
  • Layers between input and output are called hidden layers.

Why nonlinearity matters

Without activations, stacking linear layers collapses into a single linear map. The nonlinearity is what lets an MLP bend decision boundaries and act as a universal approximator given enough width.

Forward pass

For one layer the output is a matrix multiply of weights by the input vector, plus the bias, then the activation. Data flows forward layer by layer.

Training adjusts weights by backpropagation, which propagates the loss gradient backward through the layers.

Key idea

An MLP interleaves linear maps with nonlinear activations so that stacking layers builds genuinely richer functions instead of collapsing into one linear map.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does an MLP need nonlinear activations?

2. What are the layers between input and output called?