For questions related to the gated recurrent unit (GRU), a modification and simplification of the LSTM unit, which is a more sophisticated unit (with respect to the standard one) of a recurrent neural network (RNN). An RNN that uses GRU units is often called a GRU network. GRUs were introduced in the paper "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation" (2014) by Kyunghyun Cho et al.
Questions tagged [gated-recurrent-unit]
18 questions
6
votes
2 answers
Why are GRU and LSTM better than standard RNNs?
It seems that older RNNs have a limitation for their use cases and have been outperformed by other recurrent architectures, such as the LSTM and GRU.

Deep Analytics
- 63
- 2
6
votes
1 answer
What's the difference between LSTM and GRU?
I have been reading about LSTMs and GRUs, which are recurrent neural networks (RNNs). The difference between the two is the number and specific type of gates that they have. The GRU has an update gate, which has a similar role to the role of the…

Pluviophile
- 1,223
- 5
- 17
- 37
4
votes
0 answers
RNN models displays upper limit on predictions
I have trained a RNN, GRU, and LSTM on the same dataset, and looking at their respective predictions I have observed, that they all display an upper limit on the value they can predict. I have attached a graph for each of the models, which shows the…

Kornephoros
- 41
- 3
3
votes
0 answers
What is the time complexity for training a gated recurrent unit (GRU) neural network using back-propagation through time?
Let us assume we have a GRU network containing $H$ layers to process a training dataset with $K$ tuples, $I$ features, and $H_i$ nodes in each layer.
I have a pretty basic idea how the complexity of algorithms are calculated, however, with the…

rahul tomar
- 51
- 4
3
votes
1 answer
How do I choose the size of the hidden state of a GRU?
I'm trying to understand how the size of the hidden state affects the GRU.
For example, suppose I want to make a GRU count. I'm gonna feed it with three numbers, and I expect it to predict the fourth.
How should I choose the size of the hidden…

razvanc92
- 1,108
- 7
- 18
2
votes
1 answer
Multiple GRU layers to improve a text generation
I am using the model in this colab https://colab.research.google.com/github/tensorflow/text/blob/master/docs/tutorials/text_generation.ipynb#scrollTo=AM2Uma_-yVIq for Shakespeare like text generation.
It looks like this
class…

kiriloff
- 121
- 4
2
votes
0 answers
Can One-Hot Vectors be used as Inputs for Recurrent Neural Networks?
When using an RNN to encode a sentence, one normally takes each word, passes it through an embedding layer, and then uses the dense embedding as the input into the RNN.
Lets say instead of using dense embeddings, I used a one-hot representation for…

chessprogrammer
- 2,215
- 2
- 12
- 23
2
votes
0 answers
Incorporating domain knowledge into recurrent network
I am currently trying to solve a classification task with a recurrent artificial neural network (RNN).
Situation
There are up to 350 inputs (X) mapped on one categorical output (y)(13 differnt classes).
The sequence to predict is deterministic in…

DoKi
- 31
- 1
1
vote
0 answers
The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination
I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model:
This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal.
But when I…

Đạt Trần
- 11
- 2
1
vote
1 answer
Which calculation to use for GRU
Im doing trying to implement GRU in my own Neural Network Library but when I did some research i stumbled on some inconsistencies.
When calculating a cell there are as many legitimate resources which state that $\mathbf{h}_t =…

Johannes K.
- 38
- 5
1
vote
0 answers
What is the computational complexity in terms of Big-O notation of a Gated Recurrent Unit Neural network?
I have been digging up of articles across the internet in context of computational complexity of GRU. Interestingly, I came across this article, http://cse.iitkgp.ac.in/~psraja/FNNs%20,RNNs%20,LSTM%20and%20BLSTM.pdf, where it takes the following…

rahul tomar
- 51
- 4
1
vote
1 answer
Inner working of Bidirectional RNNs
I'm trying to understand how Bidirectional RNNs work.
Specifically, I want to know whether a single cell is used with different states, or two different cells are used, each having independent parameters.
In pythonic pseudocode,
Implementation…

Susmit Agrawal
- 125
- 4
0
votes
0 answers
Feature extraction from log-mel spectrograms using CNNs
I am currently working on an ASR-related project in which I would like to combine Convolutional Neural Network with GRU Network and CTC loss function.
The idea is to use the CNN to extract representative features from log-mel spectrograms, and then…

conyai
- 1
- 1
0
votes
0 answers
How to visualize the input and output of the GRU cell?
GRU belongs to the family of recurrent neural networks. This family of neural networks works on sequence data.
But, it is taking time for me to understand the differences between sequence length and input in the case of a GRU cell.
In the case of a…

hanugm
- 3,571
- 3
- 18
- 50
0
votes
1 answer
Why does validation accuracy stop rising so soon?
I have implemented a GRU to deal with youtube comment data. I am a bit confused about why the validation score seems to even out around 70% and then keeps rising, this doesn't look like overfitting from what I'm used to since it keeps rising. Is…

nibs
- 1
- 2