1

I have a video dataset as follows.

Dataset size: 1k videos

Frames per video: 4k (average) and 8k (maximum)

Labels: Each video has one label.

So the size of my input will be (N, 8000, 64, 64, 3) 64 is height and width of video. I use keras. I am not really sure how to do an end-to-end training with this kind of dataset. I was thinking of dividing each input in blocks of frames (N, 80, 100, 64, 64, 3) for training. But still it wont work for an end-to-end network training.

I am not in favor of dropping the frames. That might be my last choice.

Any help will be appreciated. Thanks in advance.

manv
  • 11
  • 2

0 Answers0