Highest Voted 'positional-encoding' Questions - Artificial Intelligence Stack Exchange

7

votes

2 answers

What is the difference between the positional encoding techniques of the Transformer and GPT?

I know the original Transformer and the GPT (1-3) use two slightly different positional encoding techniques. More specifically, in GPT they say positional encoding is learned. What does that mean? OpenAI's papers don't go into detail very much. How…

asked Nov 23 '20 at 22:03

Leevo

285
1
9

3

votes

2 answers

Is there a notion of location in Transformer architecture in subsequent self-attention layers?

Transformer architecture (without position embedding) is by the very construction equivariant to the permutation of tokens. Given query $Q \in \mathbb{R}^{n \times d}$ and keys $K \in \mathbb{R}^{n \times d}$ and some permutation matrix $P \in…

transformer attention positional-encoding

asked Oct 31 '21 at 08:10

spiridon_the_sun_rotator

2,454
8
16

2

votes

1 answer

Positional Encoding of Time-Series features

I’m trying to use a Transformer Encoder I coded with weather feature vectors which are basically 11 features about the weather in the dimension [batch_size, n_features]. I have a data point per day, so this is a time-series but there are no…

transformer time-series positional-encoding

asked May 17 '22 at 12:22

Ouilliam

21
2

2

votes

0 answers

Is there any point in adding the position embedding to the class token in Transformers?

The popular implementations of ViTs by Ross Wightman and Phil Wang add the position embedding to the class tokens as well as to the patches. Is there any point in doing so? The purpose of introduction positional embeddings to the Transformer is…

neural-networks transformer positional-encoding

asked Oct 24 '21 at 19:52

spiridon_the_sun_rotator

2,454
8
16

2

votes

0 answers

Positional Encoding in Transformer on multi-variate time series data hurts performance

I set up a transformer model that embeds positional encodings in the encoder. The data is multi-variate time series-based data. As I just experiment with the positional encoding portion of the code I set up a toy model: I generated a time series…

tensorflow pytorch transformer time-series positional-encoding

asked Jul 27 '21 at 19:18

Matt

121
1

2

votes

0 answers

How does positional encoding work in the transformer model?

In the transformer model, to incorporate positional information of texts, the researchers have added a positional encoding to the model. How does positional encoding work? How does the positional encoding system learn the positions when varying…

deep-learning natural-language-processing transformer attention positional-encoding

asked Mar 05 '20 at 11:54

Eka

1,036
8
23

2

votes

0 answers

How do the sine and cosine functions encode position in the transformer?

After going through both the "Illustrated Transformer" and "Annotated Transformer" blog posts, I still don't understand how the sinusoidal encodings are representing the position of elements in the input sequence. Is it the fact that since each row…

deep-learning natural-language-processing transformer attention positional-encoding

asked Mar 03 '19 at 19:41

shoshi

121
3

1

vote

1 answer

What is the intuition behind position-encoding?

It is clear that word positions are essential for the meaning of a sentence, and so are essential when feeding a sentence (= sequence of words) as a matrix of word embedding vectors into a transformer. I also have understood roughly how positions…

transformer word-embedding positional-encoding

asked May 10 '23 at 14:25

Hans-Peter Stricker

811
1
8
20

1

vote

0 answers

How can the Transformer model tell from positional encoding data to the origional data?

I am having trouble understanding positional encoding. Say after the wor2vec or some encoding algo we get the tensor $[0.7, 0.4, 0.2]$ for the second position. Now the final input into the model would add a positional encoding, making it $[0.7 +…

natural-language-processing transformer positional-encoding

asked Apr 04 '23 at 02:11

BlueSnake

67
5

1

vote

1 answer

Which positional encoding BERT use?

It is a little bit confusing that someone is explaining that BERT is using sinusoidal functions for BERT position encoding and someone is saying BERT just uses absolute position. I checked that Vaswani 2017 et al., used a sinusoidal function for…

bert positional-encoding

asked Sep 08 '22 at 06:04

yoon

111
3

1

vote

1 answer

Is Positional Encoding always needed for using Transformer models correctly?

I am trying to make a model that uses a Transformer to see the relationship between several data vectors, but the order of the data is not relevant in this case, so I am not using the Positional Encoding. Since the performance of models using…

deep-learning natural-language-processing transformer positional-encoding

asked Nov 14 '21 at 19:44

Angelo

201
2
16

1

vote

0 answers

Why do both sine and cosine have been used in positional encoding in the transformer model?

The Transformer model proposed in "Attention Is All You Need" uses sinusoid functions to do the positional encoding. Why have both sine and cosine been used? And why do we need to separate the odd and even dimensions to use different sinusoid…

deep-learning papers transformer machine-translation positional-encoding

asked Sep 12 '19 at 02:03

Shiyu

11
1

0

votes

1 answer

Why use exponential and log in Positional Encoding of Transformer

This code snippet is from here under the section named "Position embeddings". class SinusoidalPositionEmbeddings(nn.Module): def __init__(self, dim): super().__init__() self.dim = dim def forward(self, time): device…

transformer positional-encoding

asked Aug 06 '23 at 00:25

Jun

3
2

0

votes

0 answers

How to interpret CNN output image? CNN expressive enough for particle positions?

Let us suppose we have a squared grid and some particles (active vertices) that are distributed on it. We can construct a 2D image, where basically the input image is a matrix with 0s (no particle) and 1s (particle present). This corresponds to a…

machine-learning convolutional-neural-networks pooling positional-encoding max-pooling

asked Feb 10 '23 at 14:32

relaxon

31
4

0

votes

0 answers

Would convolution filters provide better position encoding than the traditional cos/sin functions with k,n,i, and d?

Would convolution filters provide better position encoding than the traditional cos/sin functions with k,n,i and d? My thoughts are that the traditional position encoding functions introduce too much noise into the network, with essentially random…

convolutional-neural-networks positional-encoding

asked Oct 09 '22 at 06:14

Christopher Fritzke

1
1

Questions tagged [positional-encoding]