3

I need to solve a video classification problem. While looking for solutions, I only found solutions that transform this problem into a series of simpler image classification tasks. However, this method has a downside: we ignore the temporal relationship between the frames.

So, how can I perform video classification, with a CNN, by taking into account the temporal relationships between frames?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • I edited this post to simplify and clarify it, and to make it more consistent with the answer that you accepted. Please, make sure that I did not change the meaning of it. If I did, you can just edit your post again and clarify what you're asking. If not, please, flag this comment for deletion. – nbro Dec 29 '20 at 11:13

1 Answers1

1

Look at spatio-temporal CNNs which extend the image-based CNN in 2D to 3D to handle time. These are commonly used to detect or classify action in a video. People have used them to identify specific actions in various sports such as kicking a soccer ball, throwing a baseball or dribbling a basketball. They have been used to identify fire, smoke, deep fakes, and violence in videos. Below are some articles to help get started.

Source code:

Brian O'Donnell
  • 1,853
  • 6
  • 20