2

After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence.

Is it the fact that since each row (input token) in a matrix (entire input sequence) has a unique waveform as its encoding, each of which can be expressed as a linear function of any other element in the input sequence, then the transformer can learn relations between these rows via linear functions?

nbro
  • 39,006
  • 12
  • 98
  • 176
shoshi
  • 121
  • 3
  • 2
    [What is the positional encoding in the transformer model?](https://datascience.stackexchange.com/q/51065/10640). The explanation therein may help you. – user27495 Jul 28 '19 at 17:30

0 Answers0