0

GRU belongs to the family of recurrent neural networks. This family of neural networks works on sequence data.

But, it is taking time for me to understand the differences between sequence length and input in the case of a GRU cell.

In the case of a CNN, the input tensor is of the form $B \times C \times H \times W$. Here $B$ is the batch size, $C$ is the number of channels, $H$ is the height of the image and $W$ is the width of the image.

We can visualize the input and outputs of CNN here.

Similarly, in the case of a GRU layer, the input tensor is of the form $B \times L \times I$. Here $B$ is the batch size, $L$ is the length of the sequence and $I$ is the number of input features.

I want to know what exactly the input and sequence lengths are. If there are any visualizations, please provide them.

hanugm
  • 3,571
  • 3
  • 18
  • 50
  • If we're using embeddings, $I$ might be the embedding size. If you're using the GRU for machine translation, $L$ is probably the maximum number of words that you expect in a sentence. I suppose this is the answer to your question, but I am not sure. Can you please provide a source that describes the input of the GRU unit as a tensor of shape $B \times L \times I$? – nbro Mar 06 '22 at 00:02
  • @nbro I think you missed the link from the comment. – hanugm Mar 06 '22 at 23:21
  • @nbro I updated the link. – hanugm Mar 06 '22 at 23:22

0 Answers0