Highest Voted 'encoder-decoder' Questions - Artificial Intelligence Stack Exchange

13

votes

4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…

asked Oct 29 '19 at 05:25

user8714896

717
1
4
21

4

votes

1 answer

Why do we need both encoder and decoder in sequence to sequence prediction?

Why do we need both encoder and decoder in sequence to sequence prediction? We could just have a single RNN that, given input $x$, outputs some value $y(t)$ and hidden state $h(t)$. Next, given $h(t)$ and $y(t)$, the next output $y(t+1)$ and hidden…

machine-learning ai-design sequence-modeling encoder-decoder

asked Dec 03 '18 at 16:48

greensquare

61
3

3

votes

0 answers

What is input (and shape) to K/V/Q of self-attention of EACH Decoder block of Language-translation model Transformer's tokens during Inference?

Transformer model of the original Attention paper has a decoder unit that works differently during Inference than Tranining. I'm trying to understand the shapes used during decoder (both self-attention and enc-dec-attention blocks), but it's very…

transformer attention machine-translation language-model encoder-decoder

asked Nov 09 '21 at 15:48

Joe Black

181
6

2

votes

0 answers

Combining GANs and NLP for AI-Based Programming: Generating Input-Output Templates for Computer Functions

I would like to combine GANs and NLP to create a system that can take an input and generate an appropriate output. For example, given the input 9 to the power of 2, the system would output pow(9,2). I am not entirely sure how to research this, but I…

tensorflow python chatgpt encoder-decoder

asked Mar 18 '23 at 20:17

Doğuş Deniz

21
3

1

vote

1 answer

Why can decoder-only transformers be so good at machine translation?

In my understanding encoder-decoder transformers for translation are trained with sentence or text pairs. How can it be explained in simple (high-level) terms that decoder-only transformers (e.g. GPT) are so good at machine translation, even though…

transformer machine-translation encoder-decoder

asked Jun 07 '23 at 18:46

Hans-Peter Stricker

811
1
8
20

1

vote

1 answer

Transformers: how does stacking work?

An Encoder has as inputs : Q,K,V, but has single output i.e. 3 vs 1 How do you stack those ? Is there more detailed diagram ?

transformer encoder-decoder

asked Feb 28 '23 at 01:18

sten

113
4

1

vote

0 answers

Left-to-Right vs Encoder-decoder Models

Xu et al. (2022) distinguishes between popular pre-training methods for language modeling: (see Section 2.1 PRETRAINING METHODS) Left-to-Right: Auto-regressive, Left-to-right models, predict the probability of a token given the previous…

natural-language-processing objective-functions bert gpt encoder-decoder

asked Sep 20 '22 at 22:28

keyboardAnt

39
2

1

vote

1 answer

How to Train a Decoder for Pre-trained BERT Transformer-Encoder?

Context: I am currently working on an encoder-decoder sequence to sequence model that uses a sequence of word embeddings as input and output, and then reduces the dimensionality of the word embeddings. The word embeddings are created using…

transformer bert pretrained-models seq2seq encoder-decoder

asked Sep 01 '22 at 12:47

nesquick

11
2

1

vote

1 answer

How is the transformers' output matrix size arrived at?

In this tensorflow article, the comments in the code say that MHA should output with one of the dimensions being the sequence length of the query/key. However, that means that the second MHA in the decoder layer should output something with one of…

transformer attention encoder-decoder

asked Feb 24 '21 at 04:02

Alien

111
3

0

votes

1 answer

Which situation will helpful using encoder or decoder or both in transformer model?

I have some questions about using (encoder / decoder / encoder-decoder) transformer models, included (language) transformer or Vision transformer. The overall form of a transformer consists of an encoder and a decoder. Depending on the model, you…

transformer encoder-decoder vision-transformer

asked Jul 27 '23 at 14:41

Yang

5
3

0

votes

1 answer

Is there a correct order of "conv2d", "batchnorm2d", "ReLU/LeakyReLU", "MaxPool2d" for UNet like architectures?

Context I'm investigating the UNet architecture for a little while now. After investigating the structure of the official UNet architecture as proposed in the official paper I noticed a recurrent pattern of…

convolution batch-normalization pooling encoder-decoder

asked May 29 '23 at 16:23

timu vlad

3
1

0

votes

1 answer

For a transformer decoder, how exactly are K, Q, and V for each decoding step?

For a transformer decoder, how exactly are K, Q, and V for each decoding step? Assume my input prompt is "today is a" (good day). At t= 0 (generation step 0): K, Q, and V are the projections of the sequence ("today is a") Then say the next token…

transformer autoencoders large-language-models encoder-decoder

asked May 09 '23 at 00:05

wrek

183
4

0

votes

1 answer

How do temperature and repetition penalty interfere?

I'm trying to demystify my understanding of various decoding parameters. Building on our understanding of temperature, how does the repetition penalty interfere with temperature? For example, does something special happen when the penalty is…

language-model encoder-decoder

asked Mar 12 '23 at 15:25

Corbin

103
4

0

votes

1 answer

How does mixing and matching encoders and decoders work in image segmentation?

I had a conceptual questions regarding architectures. I am using this git hub repository that allows one to quickly put together a segmentation pipeline. In reading the readme one thing that has me confused is separation of the encoder and decoders…

autoencoders semantic-segmentation encoder-decoder

asked Feb 21 '23 at 12:21

TheCodeNovice

103
2

0

votes

0 answers

Multi-task learning using single encoder + single decoder like structure?

It seems that a lot of researchers predominantly use single encoder + multiple decoders like structure to achieve multi-task learning in computer vision. Would it be reasonable to achieve the multi-task learning using single decoder to deal with…

computer-vision encoder-decoder multi-task-learning

asked Nov 06 '22 at 20:28

HOJUN LEE

1
1

Questions tagged [encoder-decoder]