Highest Voted 'seq2seq' Questions - Artificial Intelligence Stack Exchange

13

votes

4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…

asked Oct 29 '19 at 05:25

user8714896

717
1
4
21

5

votes

1 answer

What's the difference between content-based attention and dot-product attention?

I'm following this blog post which enumerates the various types of attention. It mentions content-based attention where the alignment scoring function for the $j$th encoder hidden state with respect to the $i$th context vector is the cosine…

neural-networks attention seq2seq

asked Dec 30 '20 at 10:04

Alexander Soare

1,319
2
11
26

4

votes

1 answer

Can Reinforcement Learning be used to generate sequences?

Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?

reinforcement-learning reference-request applications sequence-modeling seq2seq

asked May 26 '21 at 17:03

penguin_smasher

41
3

3

votes

2 answers

Is seq2seq the best model when input/output sequences have fixed length?

I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?

time-series sequence-modeling seq2seq

asked Aug 09 '21 at 11:23

Petrus

31
1

3

votes

0 answers

What is the difference between zero-padding and character-padding in Recurrent Neural Networks?

For RNN's to work efficiently, we vectorize the operations, which results in an input matrix of shape (m, max_seq_len) where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples…

natural-language-processing recurrent-neural-networks seq2seq padding

asked Feb 27 '21 at 11:04

PhysicsMan

31
1

2

votes

0 answers

Any models for text to json

There are many sequence to sequence (seq2seq) models and end to end models, like text to sql. I was wondering are there any text to json deep learning models? For example: Text "Switch on the computer". JSON: {"actions":["switch on"],…

deep-learning natural-language-processing seq2seq

asked Oct 10 '22 at 11:16

tired and bored dev

121
3

2

votes

0 answers

Are there any successful applications of transformers of small size (<10k weights)?

In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are the first choices in this sort of…

applications transformer sequence-modeling efficiency seq2seq

asked May 16 '21 at 19:35

spiridon_the_sun_rotator

2,454
8
16

2

votes

1 answer

How is Google Translate able to convert texts of different lengths?

According to my experience with Tensorflow and many other frameworks, neural networks have to have a fixed shape for any output, but how does Google translate convert texts of different lengths?

natural-language-processing recurrent-neural-networks machine-translation google-translate seq2seq

asked Nov 10 '20 at 05:38

Gavin

49
3

2

votes

0 answers

What is the time complexity of the forward pass and back-propagation of the sequence-to-sequence model with and without attention?

I keep looking through the literature, but can't seem to find any information regarding the time complexity of the forward pass and back-propagation of the sequence-to-sequence RNN encoder-decoder model, with and without attention. The paper…

recurrent-neural-networks transformer time-complexity forward-pass seq2seq

asked Apr 09 '20 at 21:33

user1234544

53
6

1

vote

0 answers

The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model: This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal. But when I…

long-short-term-memory sequence-modeling machine-translation seq2seq gated-recurrent-unit

asked Apr 23 '23 at 14:27

Đạt Trần

11
2

1

vote

1 answer

Why is it called a Seq2Seq model if the output is just a number?

Why is it called a Seq2Seq model if the output is just a number? For example, if you are trying to predict a movie's recommendation, and you are inputting a sequence of users and their ratings, shouldn't it be a Seq2Number model since you're only…

transformer seq2seq

asked Feb 24 '23 at 05:33

Katsu

151
3

1

vote

1 answer

Is the decoder in a transformer Seq2Seq model non parallelizable?

From my understanding, seq2seq models work by first computing a representation of the input sequence, and feeding this to the decoder. The decoder then predicts each token in the output sequence in an autoregressive manner. In this sense, it's…

deep-learning transformer seq2seq

asked Oct 12 '22 at 18:01

Andrew Tang

31
2

1

vote

2 answers

How does Seq2Seq with attention actually use the attention (i.e. the context vector)?

For neural machine translation, there's this model "Seq2Seq with attention", also known as the "Bahdanau architecture" (a good image can be found on this page), where instead of Seq2Seq's encoder LSTM passing a single hidden vector $\vec h[T]$ to…

long-short-term-memory attention seq2seq

asked Oct 07 '22 at 21:05

Mew

181
2

1

vote

1 answer

How to Train a Decoder for Pre-trained BERT Transformer-Encoder?

Context: I am currently working on an encoder-decoder sequence to sequence model that uses a sequence of word embeddings as input and output, and then reduces the dimensionality of the word embeddings. The word embeddings are created using…

transformer bert pretrained-models seq2seq encoder-decoder

asked Sep 01 '22 at 12:47

nesquick

11
2

1

vote

0 answers

When training a seq2seq model is it better to train using the models outputs or expected outputs?

When training any seq2seq model you have a target and a source. The source may be a sentence such as: I_walked_the_dog And the target being _walked_the_dogg Where as you can see the expected output for the initial I is a space _. My question is,…

natural-language-processing recurrent-neural-networks transformer attention seq2seq

asked Apr 21 '22 at 02:04

Recessive

1,346
8
21

Questions tagged [seq2seq]