Questions tagged [encoder-decoder]

17 questions
13
votes
4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…
4
votes
1 answer

Why do we need both encoder and decoder in sequence to sequence prediction?

Why do we need both encoder and decoder in sequence to sequence prediction? We could just have a single RNN that, given input $x$, outputs some value $y(t)$ and hidden state $h(t)$. Next, given $h(t)$ and $y(t)$, the next output $y(t+1)$ and hidden…
3
votes
0 answers

What is input (and shape) to K/V/Q of self-attention of EACH Decoder block of Language-translation model Transformer's tokens during Inference?

Transformer model of the original Attention paper has a decoder unit that works differently during Inference than Tranining. I'm trying to understand the shapes used during decoder (both self-attention and enc-dec-attention blocks), but it's very…
2
votes
0 answers

Combining GANs and NLP for AI-Based Programming: Generating Input-Output Templates for Computer Functions

I would like to combine GANs and NLP to create a system that can take an input and generate an appropriate output. For example, given the input 9 to the power of 2, the system would output pow(9,2). I am not entirely sure how to research this, but I…
1
vote
1 answer

Why can decoder-only transformers be so good at machine translation?

In my understanding encoder-decoder transformers for translation are trained with sentence or text pairs. How can it be explained in simple (high-level) terms that decoder-only transformers (e.g. GPT) are so good at machine translation, even though…
1
vote
1 answer

Transformers: how does stacking work?

An Encoder has as inputs : Q,K,V, but has single output i.e. 3 vs 1 How do you stack those ? Is there more detailed diagram ?
sten
  • 113
  • 4
1
vote
0 answers

Left-to-Right vs Encoder-decoder Models

Xu et al. (2022) distinguishes between popular pre-training methods for language modeling: (see Section 2.1 PRETRAINING METHODS) Left-to-Right: Auto-regressive, Left-to-right models, predict the probability of a token given the previous…
1
vote
1 answer

How to Train a Decoder for Pre-trained BERT Transformer-Encoder?

Context: I am currently working on an encoder-decoder sequence to sequence model that uses a sequence of word embeddings as input and output, and then reduces the dimensionality of the word embeddings. The word embeddings are created using…
1
vote
1 answer

How is the transformers' output matrix size arrived at?

In this tensorflow article, the comments in the code say that MHA should output with one of the dimensions being the sequence length of the query/key. However, that means that the second MHA in the decoder layer should output something with one of…
Alien
  • 111
  • 3
0
votes
1 answer

Which situation will helpful using encoder or decoder or both in transformer model?

I have some questions about using (encoder / decoder / encoder-decoder) transformer models, included (language) transformer or Vision transformer. The overall form of a transformer consists of an encoder and a decoder. Depending on the model, you…
Yang
  • 5
  • 3
0
votes
1 answer

Is there a correct order of "conv2d", "batchnorm2d", "ReLU/LeakyReLU", "MaxPool2d" for UNet like architectures?

Context I'm investigating the UNet architecture for a little while now. After investigating the structure of the official UNet architecture as proposed in the official paper I noticed a recurrent pattern of…
0
votes
1 answer

For a transformer decoder, how exactly are K, Q, and V for each decoding step?

For a transformer decoder, how exactly are K, Q, and V for each decoding step? Assume my input prompt is "today is a" (good day). At t= 0 (generation step 0): K, Q, and V are the projections of the sequence ("today is a") Then say the next token…
0
votes
1 answer

How do temperature and repetition penalty interfere?

I'm trying to demystify my understanding of various decoding parameters. Building on our understanding of temperature, how does the repetition penalty interfere with temperature? For example, does something special happen when the penalty is…
Corbin
  • 103
  • 4
0
votes
1 answer

How does mixing and matching encoders and decoders work in image segmentation?

I had a conceptual questions regarding architectures. I am using this git hub repository that allows one to quickly put together a segmentation pipeline. In reading the readme one thing that has me confused is separation of the encoder and decoders…
0
votes
0 answers

Multi-task learning using single encoder + single decoder like structure?

It seems that a lot of researchers predominantly use single encoder + multiple decoders like structure to achieve multi-task learning in computer vision. Would it be reasonable to achieve the multi-task learning using single decoder to deal with…
1
2