The GRU Cell
The gated recurrent unit, or GRU, is a simpler relative of the LSTM. It also uses gates to control memory but combines and removes parts of the LSTM design to use fewer parameters.
Two gates instead of three
- The update gate decides how much of the previous hidden state to keep versus replace with new content. It blends the roles of the LSTM forget and input gates.
- The reset gate decides how much past state to use when computing the new candidate content.
The GRU has no separate cell state. It carries information directly in the hidden state, which makes the unit leaner. With fewer gates and no extra state, a GRU has fewer weights to train.
Choosing between them
GRUs often train faster and perform comparably to LSTMs, especially on smaller datasets. LSTMs may have an edge on some long or complex sequences thanks to their separate cell state. In practice both are tried and the better one is kept.
Key idea
The GRU uses update and reset gates and a single hidden state, giving fewer parameters than an LSTM while reaching comparable performance.