I'm trying to understand how the size of the hidden state affects the GRU.
For example, suppose I want to make a GRU count. I'm gonna feed it with three numbers, and I expect it to predict the fourth.
How should I choose the size of the hidden state of a GRU?