I have been digging up of articles across the internet in context of computational complexity of GRU. Interestingly, I came across this article, http://cse.iitkgp.ac.in/~psraja/FNNs%20,RNNs%20,LSTM%20and%20BLSTM.pdf, where it takes the following notations:
Let I be the number of inputs, K be the number of outputs and H be the number of cells in the hidden layer
And then goes on to explain the computational complexity of FNNs, RNNs, BRNNs, LSTM and BLSTM computational complexity is O(W) i.e., the total number of edges in the network.
where
For FNN: $W = IH + HK$ ( I get this part as, for fully connected networks, we have connections from each input to each node in hidden and subsequently for hidden to output nodes)
For RNN: $W = IH + H^2$ + HK ( The formula is pretty same as is it for FNN but where does this $H^2$ come into picture?)
For LSTM : $W = 4IH + 4H^2 + 3H + HK$ (It becomes more difficult as it comes down to LSTM as to where the 4's and 3's come into the equation? )
Continuing with these notations, can I get a similar notation for GRU as well? This can be very helpful for understanding.