Questions tagged [positional-encoding]
17 questions
7
votes
2 answers
What is the difference between the positional encoding techniques of the Transformer and GPT?
I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques.
More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much.
How…

Leevo
- 285
- 1
- 9
3
votes
2 answers
Is there a notion of location in Transformer architecture in subsequent self-attention layers?
Transformer architecture (without position embedding) is by the very construction equivariant to the permutation of tokens. Given query $Q \in \mathbb{R}^{n \times d}$ and keys $K \in \mathbb{R}^{n \times d}$ and some permutation matrix $P \in…

spiridon_the_sun_rotator
- 2,454
- 8
- 16
2
votes
1 answer
Positional Encoding of Time-Series features
I’m trying to use a Transformer Encoder I coded with weather feature vectors which are basically 11 features about the weather in the dimension [batch_size, n_features].
I have a data point per day, so this is a time-series but there are no…

Ouilliam
- 21
- 2
2
votes
0 answers
Is there any point in adding the position embedding to the class token in Transformers?
The popular implementations of ViTs by Ross Wightman and Phil Wang add the position embedding to the class tokens as well as to the patches.
Is there any point in doing so?
The purpose of introduction positional embeddings to the Transformer is…

spiridon_the_sun_rotator
- 2,454
- 8
- 16
2
votes
0 answers
Positional Encoding in Transformer on multi-variate time series data hurts performance
I set up a transformer model that embeds positional encodings in the encoder. The data is multi-variate time series-based data.
As I just experiment with the positional encoding portion of the code I set up a toy model: I generated a time series…

Matt
- 121
- 1
2
votes
0 answers
How does positional encoding work in the transformer model?
In the transformer model, to incorporate positional information of texts, the researchers have added a positional encoding to the model. How does positional encoding work? How does the positional encoding system learn the positions when varying…

Eka
- 1,036
- 8
- 23
2
votes
0 answers
How do the sine and cosine functions encode position in the transformer?
After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence.
Is it the fact that since each row…

shoshi
- 121
- 3
1
vote
1 answer
What is the intuition behind position-encoding?
It is clear that word positions are essential for the meaning of a sentence, and so are essential when feeding a sentence (= sequence of words) as a matrix of word embedding vectors into a transformer. I also have understood roughly how positions…

Hans-Peter Stricker
- 811
- 1
- 8
- 20
1
vote
0 answers
How can the Transformer model tell from positional encoding data to the origional data?
I am having trouble understanding positional encoding. Say after the wor2vec or some encoding algo we get the tensor $[0.7, 0.4, 0.2]$ for the second position. Now the final input into the model would add a positional encoding, making it $[0.7 +…

BlueSnake
- 67
- 5
1
vote
1 answer
Which positional encoding BERT use?
It is a little bit confusing that someone is explaining that BERT is using sinusoidal functions for BERT position encoding and someone is saying BERT just uses absolute position.
I checked that Vaswani 2017 et al., used a sinusoidal function for…

yoon
- 111
- 3
1
vote
1 answer
Is Positional Encoding always needed for using Transformer models correctly?
I am trying to make a model that uses a Transformer to see the relationship between several data vectors, but the order of the data is not relevant in this case, so I am not using the Positional Encoding.
Since the performance of models using…

Angelo
- 201
- 2
- 16
1
vote
0 answers
Why do both sine and cosine have been used in positional encoding in the transformer model?
The Transformer model proposed in "Attention Is All You Need" uses sinusoid functions to do the positional encoding.
Why have both sine and cosine been used? And why do we need to separate the odd and even dimensions to use different sinusoid…

Shiyu
- 11
- 1
0
votes
1 answer
Why use exponential and log in Positional Encoding of Transformer
This code snippet is from here under the section named "Position embeddings".
class SinusoidalPositionEmbeddings(nn.Module):
def __init__(self, dim):
super().__init__()
self.dim = dim
def forward(self, time):
device…

Jun
- 3
- 2
0
votes
0 answers
How to interpret CNN output image? CNN expressive enough for particle positions?
Let us suppose we have a squared grid and some particles (active vertices) that are distributed on it.
We can construct a 2D image, where basically the input image is a matrix with 0s (no particle) and 1s (particle present). This corresponds to a…

relaxon
- 31
- 4
0
votes
0 answers
Would convolution filters provide better position encoding than the traditional cos/sin functions with k,n,i, and d?
Would convolution filters provide better position encoding than the traditional cos/sin functions with k,n,i and d? My thoughts are that the traditional position encoding functions introduce too much noise into the network, with essentially random…