← Lessons

quiz vs the machine

Platinum1750

Machine Learning

The LSTM Cell

A gated recurrent unit that carries a cell state to remember long range information.

6 min read · advanced · beat Platinum to climb

The LSTM Cell

The long short term memory cell, or LSTM, is a recurrent unit designed to remember information over long spans. It adds a separate cell state that flows through time with only small, controlled changes.

The gates

Three gates regulate the cell state, each a small network that outputs values between zero and one:

  • The forget gate decides what to erase from the cell state.
  • The input gate decides what new information to write.
  • The output gate decides what part of the cell state to expose as the hidden state.

Because the cell state is updated mostly by addition rather than repeated multiplication, gradients flow more easily across many steps. This is how the LSTM mitigates the vanishing gradient problem that hurts plain RNNs.

Why it works

The gates let the network learn when to keep, update, or release memory based on the data. A relevant fact can persist for many steps until the forget gate clears it. This selective memory made LSTMs the workhorse for sequence tasks before transformers became dominant.

Key idea

The LSTM uses forget, input, and output gates over a carried cell state to selectively remember information across long sequences and ease vanishing gradients.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the forget gate control?

2. Why does the cell state help gradients flow?

3. Which problem do LSTMs mainly address compared to plain RNNs?