Gated Recurrent Unit (GRU)
- Neural Network Zoo | Fjodor Van Veen
- Recurrent Neural Network (RNN) Variants:
- Understanding GRU Networks | Simeon Kostadinov - Towards Data Science
- Animated RNN, LSTM and GRU | Raimi Karim - Towards Data Science
a gating mechanism in Recurrent Neural Network (RNN) Gated recurrent units (GRU) are a slight variation on LSTMs. They have one less gate and are wired slightly differently: instead of an input, output and a forget gate, they have an update gate. This update gate determines both how much information to keep from the last state and how much information to let in from the previous layer. The reset gate functions much like the forget gate of an LSTM but it’s located slightly differently. They always send out their full state, they don’t have an output gate. In most cases, they function very similarly to LSTMs, with the biggest difference being that GRUs are slightly faster and easier to run (but also slightly less expressive). In practice these tend to cancel each other out, as you need a bigger network to regain some expressiveness which then in turn cancels out the performance benefits. In some cases where the extra expressiveness is not needed, GRUs can outperform LSTMs. Chung, Junyoung, et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014). The GRU is like a Long Short-Term Memory (LSTM) with forget gate but has fewer parameters than LSTM, as it lacks an output gate. GRU's performance on certain tasks of polyphonic music modeling and speech signal modeling was found to be similar to that of LSTM. GRUs have been shown to exhibit even better performance on certain smaller datasets. Gated Recurrent Unit | Wikipedia
Bidirectional Gated Recurrent Unit (BiGRU) looks exactly the same as its unidirectional counterpart. The difference is that the gate is not just connected to the past, but also to the future. Schuster, Mike, and Kuldip K. Paliwal. “Bidirectional recurrent neural networks.” IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.