Processing sequences
A recurrent neural network (RNN) reads a sequence one element at a time, updating a hidden state that summarizes everything seen so far.
- At each step the new hidden state depends on the current input and the previous hidden state.
- The same weights are reused at every step, called parameter sharing across time.
- The hidden state acts as the network memory.
Training
RNNs are trained with backpropagation through time, which unrolls the loop into a deep chain and propagates gradients backward across steps.
The trouble
Long chains cause vanishing or exploding gradients, so plain RNNs struggle to learn dependencies that span many steps. This limitation motivated gated cells like the LSTM.
Key idea
An RNN threads a single hidden state through a sequence with shared weights, giving it memory but also long range gradient problems.