0

I'm currently doing some researches on video recognition. What I'm trying to do is like this paper.

The idea is that: for processing a specific input video clip (shape: [T, C, H, W]), it needs features of the video clip from last timestamp, where we are trying to build a long-time feature memory.

As a result, we have to sequentially read consecutive video clips, and the dataloader will output the data like this: (the number represents the timestamp of a video clip, and each row represents a batch)

[1,  9, 18, 27, 36]
[2, 10, 19, 28, 37]
[3, 11, 20, 29, 38]
[4, 12, 21, 30, 39]
...

The thing is, due to the sequential reading, I couldn't do shuffling for the dataloader, which unsurprisingly results in a pretty bad performance.

For such streaming loading problem, are there any tips to improve the bad performance caused by no shuffling?

Noted: I have done the following experiments to verify that the problem is really caused by no shuffling:

  1. original dataloader with shuffling: good accuracy
  2. original dataloader without shuffling: worst accuracy
  3. the dataloader I mentioned in the question: medium accuracy but still bad
Henry
  • 1
  • 1

1 Answers1

1

There are no ways to make the model or training pipeline adapt to the sequential loading of data, since the model pipeline is pretty much independent of the data pipeline due to minibatching.

There are however some ways I can think of to handle random shuffle of video segments.

  • First, is to convert all the videos into uint8 4d numpy array of (T, H, W, C) dimension, then you could use memmap to read a segment from it during dataloading, which would be very trivial in cost. However, the downside is obviously you need a lot of disk memory to do this. In particular, from mp4 to npy would almost give you an 100x memory increase.

  • You can pre-slice the videos into segments of length that your model expects. This would allow you to shuffle the segments easily.

  • Use a very efficient video segment processor such as opencv, ffmpeg, or the recent PyTorch video loading API. However, this is still bound to be slow, as you are randomly accessing the middle of a video segment.

PeaBrane
  • 346
  • 1
  • 7
  • Sorry, my question may be misunderstanding. What I want to do is to read consecutive video clips (segments) not frames, so I couldn't shuffle the clips. But still thanks for your reply! – Henry Sep 07 '22 at 05:59