How is it possible to use batches of data from within the same sequence with an LSTM?

Question

ETA: More concise wording: Why do some implementations use batches of data taken from within the same sequence? Does this not make the cell state useless?

Using the example of an LSTM, it has a hidden state and cell state. These states are updated as new inputs are passed to an LSTM. The problem here, is if you use batches of data taken from the same timeframe, the hidden state and cell state can't be computed on previous values.

This is a major problem. The main advantage of an LSTM is this exact mechanic

For an example of what I mean, take the following. A simple LSTM used to predict a sine wave. At training time, I can't think of any way you could use batches. As you have one timeseries here that you are predicting, you have to start at time step 0 in order to properly compute and train the hidden state and cell state.

Taking batches like in the image below and computing them in parallel will mean the hidden and cell state can't be computed properly.

And yet, in the example I gave the batch_size is set to 10? This makes no sense to me. It also doesn't help tensorflow syntax isn't exactly the most verbose...

The only use case for batches in an LSTM I can see is if you have multiple totally independent sets of timeseries that can all be computed from timestep 0 in parallel with each having it's own cell and hidden state

My implementation

I actually duplicated the example LSTM from above but used Pytorch instead. My code can be found in a Kaggle notebook here, but as you can see, I've commented out the LSTM from the model and replaced it with a fc layer which performs just as well as the LSTM, because like I said, while using batches in this way it makes the LSTM utterly redundant.

Can you please elaborate on the problem you're having with the inability to use batches for sequences (using a LSTM)? — hal9000, Jul 20 '22 at 16:51
So you have a batch of sequences , and let assume you start with an zero's cell state. This batch of sequences you can break up into normal batches of inputs, the cellstate then binds all these batches back together into a sequence through the computational graph. So for as far as I know you shouldn't run into any problems. :p — hal9000, Jul 20 '22 at 16:58
@hal9000 Your wording's a little confusing, but I think what you mean is having each element of a batch be an independent sequence? In which case I understand that's a possibility, but the example I gave has one sequence that is broken into batches, which is what doesn't make sense to me — Recessive, Jul 25 '22 at 01:54
Just having one sequence then breaking the sequence up so each part doesn't rely on other parts isn't how an recurrent neural network works, I totally agree with you that wouldn't work. — hal9000, Jul 26 '22 at 16:30
But you can create from one sequence a lot of sequences, each with different starting points and make a batch out of that. Maybe that is what is meant with it?? — hal9000, Jul 26 '22 at 16:32
yes, I meant each element of the batch to be an independent sequence. (then maybe create a bunch of sequences from this one sequence (having random starting points for example)) — hal9000, Jul 26 '22 at 16:33
@hal9000 I guess that would depend on how dependent each segment within a sequence is to another segment? Like splitting a paragraph up into smaller sentences/words would completely ruin the networks ability to learn the overall meaning of the paragraph, but in the example the sine wave is repetitive so it would probably work to take random splits — Recessive, Jul 27 '22 at 01:57
That is true, for text you could for example add noise to the embedding of some words so it has to infer those words from the context when doing it's prediction. (where your batch could be all kind of different combinations of words knocked out) — hal9000, Jul 28 '22 at 10:26
preferably you would have multiple texts with their problem of-course assuming it's a supervised task, and have a combination of different texts in each batch. — hal9000, Jul 28 '22 at 10:29

How is it possible to use batches of data from within the same sequence with an LSTM?

This is a major problem. The main advantage of an LSTM is this exact mechanic

My implementation

0 Answers0