What it is
A recurrent neural network processes a sequence one element at a time while carrying a hidden state forward. The hidden state acts as a memory that summarizes everything seen so far.
How a step works
At each step the network takes the current input and the previous hidden state and produces a new hidden state.
- The new state mixes the fresh input with the running memory
- The same weights are applied at every step, which is weight sharing across time
- An output can be read from the hidden state at any step
Because the same function repeats, a recurrent network can handle sequences of any length with a fixed set of weights.
Training and its limits
Recurrent networks train with backpropagation through time, which unrolls the loop into a deep chain and propagates gradients backward.
- This makes them strong at language, audio, and time series
- But long chains suffer from vanishing gradients, so plain recurrent networks struggle to learn long range dependencies
This weakness is exactly what gated cells like the LSTM were designed to fix.
Key idea
A recurrent network carries a hidden state across a sequence, reusing one set of weights at every step.