What it is
A gated recurrent unit is a streamlined gated recurrent cell. It keeps the memory benefits of an LSTM but uses fewer parts, merging the cell state and hidden state into one vector.
The two gates
A GRU uses two gates instead of three.
- The update gate decides how much of the old hidden state to keep versus replace with new content
- The reset gate decides how much of the past to ignore when forming the new candidate state
The update gate acts like a blend control. When it stays near keep, the hidden state passes through almost unchanged, which preserves long range memory.
How it compares to LSTM
A GRU and an LSTM solve the same long range memory problem with different trade offs.
- A GRU has fewer parameters, so it trains a little faster and needs less data
- An LSTM has a separate cell state and an extra gate, giving it more capacity
- In practice the two often perform similarly, so the choice is empirical
Key idea
A GRU uses an update gate and a reset gate over a single state to keep long range memory with fewer parameters than an LSTM.