Questions tagged [seq2seq]

For questions related to sequence-to-sequence (seq2seq) machine learning models/architectures, used e.g. in machine translation.

28 questions
13
votes
4 answers

What exactly is a hidden state in an LSTM and RNN?

I'm working on a project, where we use an encoder-decoder architecture. We decided to use an LSTM for both the encoder and decoder due to its hidden states. In my specific case, the hidden state of the encoder is passed to the decoder, and this…
5
votes
1 answer

What's the difference between content-based attention and dot-product attention?

I'm following this blog post which enumerates the various types of attention. It mentions content-based attention where the alignment scoring function for the $j$th encoder hidden state with respect to the $i$th context vector is the cosine…
Alexander Soare
  • 1,319
  • 2
  • 11
  • 26
4
votes
1 answer

Can Reinforcement Learning be used to generate sequences?

Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?
3
votes
2 answers

Is seq2seq the best model when input/output sequences have fixed length?

I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?
Petrus
  • 31
  • 1
3
votes
0 answers

What is the difference between zero-padding and character-padding in Recurrent Neural Networks?

For RNN's to work efficiently, we vectorize the operations, which results in an input matrix of shape (m, max_seq_len) where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples…
2
votes
0 answers

Any models for text to json

There are many sequence to sequence (seq2seq) models and end to end models, like text to sql. I was wondering are there any text to json deep learning models? For example: Text "Switch on the computer". JSON: {"actions":["switch on"],…
2
votes
0 answers

Are there any successful applications of transformers of small size (<10k weights)?

In the problems of NLP and sequence modeling, the Transformer architectures based on the self-attention mechanism (proposed in Attention Is All You Need) have achieved impressive results and now are the first choices in this sort of…
2
votes
1 answer

How is Google Translate able to convert texts of different lengths?

According to my experience with Tensorflow and many other frameworks, neural networks have to have a fixed shape for any output, but how does Google translate convert texts of different lengths?
2
votes
0 answers

What is the time complexity of the forward pass and back-propagation of the sequence-to-sequence model with and without attention?

I keep looking through the literature, but can't seem to find any information regarding the time complexity of the forward pass and back-propagation of the sequence-to-sequence RNN encoder-decoder model, with and without attention. The paper…
1
vote
0 answers

The model's accuracy becomes suddenly so unreasonably good at beginning of the training process. I need an explaination

I am practicing machine translation using seq2seq model (more specifically with GRU/LSTM units). The following is my first model: This model first archived about 0.03 accuracy score and gradually improved after then. It seems normal. But when I…
1
vote
1 answer

Why is it called a Seq2Seq model if the output is just a number?

Why is it called a Seq2Seq model if the output is just a number? For example, if you are trying to predict a movie's recommendation, and you are inputting a sequence of users and their ratings, shouldn't it be a Seq2Number model since you're only…
Katsu
  • 151
  • 3
1
vote
1 answer

Is the decoder in a transformer Seq2Seq model non parallelizable?

From my understanding, seq2seq models work by first computing a representation of the input sequence, and feeding this to the decoder. The decoder then predicts each token in the output sequence in an autoregressive manner. In this sense, it's…
1
vote
2 answers

How does Seq2Seq with attention actually use the attention (i.e. the context vector)?

For neural machine translation, there's this model "Seq2Seq with attention", also known as the "Bahdanau architecture" (a good image can be found on this page), where instead of Seq2Seq's encoder LSTM passing a single hidden vector $\vec h[T]$ to…
Mew
  • 181
  • 2
1
vote
1 answer

How to Train a Decoder for Pre-trained BERT Transformer-Encoder?

Context: I am currently working on an encoder-decoder sequence to sequence model that uses a sequence of word embeddings as input and output, and then reduces the dimensionality of the word embeddings. The word embeddings are created using…
1
vote
0 answers

When training a seq2seq model is it better to train using the models outputs or expected outputs?

When training any seq2seq model you have a target and a source. The source may be a sentence such as: I_walked_the_dog And the target being _walked_the_dogg Where as you can see the expected output for the initial I is a space _. My question is,…
1
2