The RNN for Sequences
A recurrent neural network, or RNN, reads a sequence one element at a time and keeps a hidden state that summarizes everything seen so far. At each step it combines the current input with the previous hidden state to produce a new state.
Why recurrence helps
- The same weights are reused at every time step, so the model handles variable length sequences.
- The hidden state acts as a memory passing information forward through time.
- Outputs can be produced at each step, like a tag per word, or once at the end, like a sentiment label.
This shared structure makes RNNs natural for language, where each word depends on those before it.
The vanishing gradient problem
Training uses backpropagation through time, unrolling the network across the sequence. Over many steps the gradients can shrink toward zero, so the network struggles to learn long range dependencies. This vanishing gradient problem motivated gated variants that protect information over long spans.
Key idea
An RNN carries a hidden state through a sequence using shared weights, but plain RNNs struggle with long range dependencies due to vanishing gradients.