Gated recurrence
The LSTM and GRU add learnable gates that control how memory flows, fixing the vanishing gradient problem of plain RNNs.
The LSTM
An LSTM keeps a separate cell state plus a hidden state and uses three gates.
- The forget gate decides what to erase from the cell state.
- The input gate decides what new information to write.
- The output gate decides what to expose as the hidden state.
The cell state flows along a nearly linear path, so gradients survive over many steps.
The GRU
A GRU is a lighter variant with two gates, an update gate and a reset gate, and no separate cell state. It often matches the LSTM with fewer parameters.
Key idea
Gates let LSTMs and GRUs decide what to keep, write, and read, preserving a stable memory path that carries gradients across long sequences.