GRU Cells

What it is

A gated recurrent unit is a streamlined gated recurrent cell. It keeps the memory benefits of an LSTM but uses fewer parts, merging the cell state and hidden state into one vector.

The two gates

A GRU uses two gates instead of three.

The update gate decides how much of the old hidden state to keep versus replace with new content
The reset gate decides how much of the past to ignore when forming the new candidate state

The update gate acts like a blend control. When it stays near keep, the hidden state passes through almost unchanged, which preserves long range memory.

How it compares to LSTM

A GRU and an LSTM solve the same long range memory problem with different trade offs.

A GRU has fewer parameters, so it trains a little faster and needs less data
An LSTM has a separate cell state and an extra gate, giving it more capacity
In practice the two often perform similarly, so the choice is empirical

Key idea

A GRU uses an update gate and a reset gate over a single state to keep long range memory with fewer parameters than an LSTM.

What it is

The two gates

How it compares to LSTM

Key idea

Check yourself