For questions related to sequence-to-sequence (seq2seq) machine learning models/architectures, used e.g. in machine translation.
Questions tagged [seq2seq]
28 questions
13
votes
4 answers
What exactly is a hidden state in an LSTM and RNN?
I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…

user8714896
- 717
- 1
- 4
- 21
5
votes
1 answer
What's the difference between content-based attention and dot-product attention?
I'm following this blog post which enumerates the various types of attention.
It mentions content-based attention where the alignment scoring function for the $j$th encoder hidden state with respect to the $i$th context vector is the cosine…

Alexander Soare
- 1,319
- 2
- 11
- 26
4
votes
1 answer
Can Reinforcement Learning be used to generate sequences?
Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?

penguin_smasher
- 41
- 3
3
votes
2 answers
Is seq2seq the best model when input/output sequences have fixed length?
I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?

Petrus
- 31
- 1
3
votes
0 answers
What is the difference between zero-padding and character-padding in Recurrent Neural Networks?
For RNN's to work efficiently, we vectorize the operations, which results in an input matrix of shape
(m, max_seq_len)
where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples…

PhysicsMan
- 31
- 1
2
votes
0 answers
Any models for text to json
There are many sequence to sequence (seq2seq) models and end to end models, like text to sql. I was wondering are there any text to json deep learning models?
For example:
Text
"Switch on the computer".
JSON:
{"actions":["switch on"],…

tired and bored dev
- 121
- 3
2
votes
0 answers
Are there any successful applications of transformers of small size (<10k weights)?
In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are the first choices in this sort of…

spiridon_the_sun_rotator
- 2,454
- 8
- 16
2
votes
1 answer
How is Google Translate able to convert texts of different lengths?
According to my experience with Tensorflow and many other frameworks, neural networks have to have a fixed shape for any output, but how does Google translate convert texts of different lengths?

Gavin
- 49
- 3
2
votes
0 answers
What is the time complexity of the forward pass and back-propagation of the sequence-to-sequence model with and without attention?
I keep looking through the literature, but can't seem to find any information regarding the time complexity of the forward pass and back-propagation of the sequence-to-sequence RNN encoder-decoder model, with and without attention.
The paper…

user1234544
- 53
- 6
1
vote
0 answers
The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination
I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model:
This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal.
But when I…

Đạt Trần
- 11
- 2
1
vote
1 answer
Why is it called a Seq2Seq model if the output is just a number?
Why is it called a Seq2Seq model if the output is just a number?
For example, if you are trying to predict a movie's recommendation, and you are inputting a sequence of users and their ratings, shouldn't it be a Seq2Number model since you're only…

Katsu
- 151
- 3
1
vote
1 answer
Is the decoder in a transformer Seq2Seq model non parallelizable?
From my understanding, seq2seq models work by first computing a representation of the input sequence, and feeding this to the decoder. The decoder then predicts each token in the output sequence in an autoregressive manner. In this sense, it's…

Andrew Tang
- 31
- 2
1
vote
2 answers
How does Seq2Seq with attention actually use the attention (i.e. the context vector)?
For neural machine translation, there's this model "Seq2Seq with attention", also known as the "Bahdanau architecture" (a good image can be found on this page), where instead of Seq2Seq's encoder LSTM passing a single hidden vector $\vec h[T]$ to…

Mew
- 181
- 2
1
vote
1 answer
How to Train a Decoder for Pre-trained BERT Transformer-Encoder?
Context:
I am currently working on an encoder-decoder sequence to sequence model that uses a sequence of word embeddings as input and output, and then reduces the dimensionality of the word embeddings.
The word embeddings are created using…

nesquick
- 11
- 2
1
vote
0 answers
When training a seq2seq model is it better to train using the models outputs or expected outputs?
When training any seq2seq model you have a target and a source. The source may be a sentence such as:
I_walked_the_dog
And the target being
_walked_the_dogg
Where as you can see the expected output for the initial I is a space _. My question is,…

Recessive
- 1,346
- 8
- 21