What it is
A long short term memory cell is a recurrent unit designed to remember information over many steps. It adds a separate cell state that carries memory along the sequence with little change unless the cell decides otherwise.
The three gates
An LSTM controls its memory with three learned gates, each a small layer that outputs values between zero and one.
- The forget gate decides what to erase from the cell state
- The input gate decides what new information to write
- The output gate decides what part of the cell state to expose as the hidden state
Because the cell state mostly flows straight through, gradients survive across many steps. This is why LSTMs learn long range dependencies that plain recurrent networks miss.
Why the design works
The key trick is an additive path for the cell state rather than a repeated multiplication.
- Adding new content instead of multiplying avoids the vanishing gradient problem
- Gates let the cell keep a fact for a long time, then drop it when no longer needed
- The same gated function repeats at every step
Key idea
An LSTM uses forget, input, and output gates around a protected cell state to hold memory across long sequences.