2

When using an RNN to encode a sentence, one normally takes each word, passes it through an embedding layer, and then uses the dense embedding as the input into the RNN.

Lets say instead of using dense embeddings, I used a one-hot representation for each word, and fed that sequence into the RNN. My question is which of these two outcomes is correct:

  1. Due to the way in which an RNN combines inputs, since these vectors are all orthogonal, absolutely nothing can be combined, and the entire setup does not make sense.

  2. The setup does make sense and it will still work, but not be as effective as using a dense embedding.

I know I could run an experiment and see what happens, but this is fundamentally a theoretical question, and I would appreciate if someone could clarify so that I have a better understanding of how RNNs combine inputs. I suspect that the answer to this question would be the same regardless of whether we are discussing a vanilla RNN or an LSTM or GRU, but if that is not the case, please explain why.

Thank you.

chessprogrammer
  • 2,215
  • 2
  • 12
  • 23
  • 1
    Interesting question - curious to see how others response but I vote for #2 (more time to process 1 hot vectors since you vectors will likely have more dimensions then the 512 or 768 typically found in dense embeddings and will see lower accuracy from your model) – Adnan S Nov 26 '20 at 20:23

0 Answers0