← Lessons

quiz vs the machine

Gold1420

Machine Learning

The LSTM and GRU Recap

Gates regulate memory so recurrent nets learn long range dependencies.

5 min read · core · beat Gold to climb

Gated recurrence

The LSTM and GRU add learnable gates that control how memory flows, fixing the vanishing gradient problem of plain RNNs.

The LSTM

An LSTM keeps a separate cell state plus a hidden state and uses three gates.

  • The forget gate decides what to erase from the cell state.
  • The input gate decides what new information to write.
  • The output gate decides what to expose as the hidden state.

The cell state flows along a nearly linear path, so gradients survive over many steps.

The GRU

A GRU is a lighter variant with two gates, an update gate and a reset gate, and no separate cell state. It often matches the LSTM with fewer parameters.

Key idea

Gates let LSTMs and GRUs decide what to keep, write, and read, preserving a stable memory path that carries gradients across long sequences.

Check yourself

Answer to earn rating on the learn ladder.

1. Which gate decides what to erase from the LSTM cell state?

2. How does a GRU differ from an LSTM?

3. Why does the LSTM cell state help gradients?