How do the sine and cosine functions encode position in the transformer?

Asked Mar 03 '19 at 19:41

Active Nov 30 '21 at 15:41

Viewed 248 times

After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence.

Is it the fact that since each row (input token) in a matrix (entire input sequence) has a unique waveform as its encoding, each of which can be expressed as a linear function of any other element in the input sequence, then the transformer can learn relations between these rows via linear functions?

edited Nov 30 '21 at 15:41

nbro

39,006
12
98
176

asked Mar 03 '19 at 19:41

shoshi

2

[What is the positional encoding in the transformer model?](https://datascience.stackexchange.com/q/51065/10640). The explanation therein may help you. – user27495 Jul 28 '19 at 17:30

How do the sine and cosine functions encode position in the transformer?

0 Answers0

Linked