The LSTM and GRU Recap

Gated recurrence

The LSTM and GRU add learnable gates that control how memory flows, fixing the vanishing gradient problem of plain RNNs.

The LSTM

An LSTM keeps a separate cell state plus a hidden state and uses three gates.

The forget gate decides what to erase from the cell state.
The input gate decides what new information to write.
The output gate decides what to expose as the hidden state.

The cell state flows along a nearly linear path, so gradients survive over many steps.

The GRU

A GRU is a lighter variant with two gates, an update gate and a reset gate, and no separate cell state. It often matches the LSTM with fewer parameters.

Key idea

Gates let LSTMs and GRUs decide what to keep, write, and read, preserving a stable memory path that carries gradients across long sequences.

The LSTM and GRU Recap

Gated recurrence

The LSTM

The GRU

Key idea

Check yourself