Questions tagged [sequence-modeling]

Questions about the analysis of sequential data, often used to analyse audio information or to predict time series.

66 questions
23
votes
3 answers

Can the decoder in a transformer model be parallelized like the encoder?

Can the decoder in a transformer model be parallelized like the encoder? As far as I understand, the encoder has all the tokens in the sequence to compute the self-attention scores. But for a decoder, this is not possible (in both training and…
8
votes
4 answers

How can I predict the next number in a non-obvious sequence?

I've got an array of integers ranging from -3 to +3. Example: [1, 3, -2, 0, 0, 1] The array has no obvious pattern since it represents bipolar disorder mood swings. What is the most suitable approach to predict the next number in the series? The…
6
votes
2 answers

What evaluation metric are used for sequence-to-sequence prediction problems?

I am solving many sequence-to-sequence prediction problems using RNN/LSTM. What type of evaluation metrics can be used for sequence prediction problems? One metric is the mean squared error (MSE) that we can give as a parameter during the training…
6
votes
3 answers

In sequence-to-sequence, why is the output of the decoder used as its input?

The basic seq-2-seq model consists of 2 parts: a recurrent encoder that compresses a sequence to a vector and decoder that unrolls the vector into the output sequence: Why is the output, w, x, y, z of the decoder used as its input? Shouldn't the…
user8426627
  • 358
  • 1
  • 11
5
votes
2 answers

Why do Transformers have a sequence limit at inference time?

As far as I understand, Transformer's time complexity increases quadratically with respect to the sequence length. As a result, during training to make training feasible, a maximum sequence limit is set, and to allow batching, all sequences smaller…
5
votes
1 answer

Why do small datasets require more samples, while big datasets require fewer samples in negative sampling?

In the deep learning specialization course by Andrew Ng, in the video Sequence Models (minute 4:13), he says that in negative sampling we have to choose a sample of words from the corpus to train rather than choosing the whole corpus. But he said…
4
votes
1 answer

Why do we need both encoder and decoder in sequence to sequence prediction?

Why do we need both encoder and decoder in sequence to sequence prediction? We could just have a single RNN that, given input $x$, outputs some value $y(t)$ and hidden state $h(t)$. Next, given $h(t)$ and $y(t)$, the next output $y(t+1)$ and hidden…
4
votes
1 answer

Can Reinforcement Learning be used to generate sequences?

Can we use reinforcement learning for sequence-to-sequence tasks? If yes, whether or not this is a good choice, how could this be done?
4
votes
1 answer

How can I use machine learning to predict properties (such as the area) of simple polygons?

Imagine a set of simple (non-self-intersecting) polygons given by the coordinate pairs of their vertices $[(x_1, y_1), (x_2, y_2), \dots,(x_n, y_n)]$. The polygons in the set have a different number of vertices. How can I use machine learning to…
4
votes
0 answers

Can sequence-to-sequence models be used to convert source code from one programming language to another?

Sequence-to-sequence models have achieved good performance in natural language translation. Could these models also be applied to convert source code written in one programming language to source code written in another language? Could they also be…
3
votes
2 answers

Is seq2seq the best model when input/output sequences have fixed length?

I understand that seq2seq models are perfectly suitable when the input and/or the output have variable lengths. However, if we know exactly the input/output sequence lengths of the neural network. Is this the best approach?
Petrus
  • 31
  • 1
3
votes
2 answers

How to use LSTM to generate a paragraph

A LSTM model can be trained to generate text sequences by feeding the first word. After feeding the first word, the model will generate a sequence of words (a sentence). Feed the first word to get the second word, feed the first word + the second…
2
votes
1 answer

Can the recurrent neural network's input come from a short-time Fourier transform?

Can the recurrent neural network input come from a short-time Fourier transform? I mean the input is not from the time-series domain.
2
votes
1 answer

Difference between dot product attention and "matrix attention"

As far as I know, attention was first introduced in Learning To Align And Translate. There, the core mechanism which is able to disregard the sequence length, is a dynamically-built matrix, of shape output_size X input_size, in which every position…
Gulzar
  • 729
  • 1
  • 8
  • 23
2
votes
1 answer

Sequence Embedding using embedding layer: how does the network architecture influence it?

I want to obtain a dense vector representation of protein sequences so that I can meaningfully represent them in an embedding space. We can consider them as sequences of letters, in particular there are 21 unique symbols which are the amino acids…
1
2 3 4 5