Questions tagged [action-recognition]

For questions regarding action recognition. This should be used when asking about what could be implemented that complements or harms this.

13 questions
7
votes
5 answers

How can action recognition be achieved?

For example, I would like to train my neural network to recognize the type of actions (e.g. in commercial movies or some real-life videos), so I can "ask" my network in which video or movie (and at what frames) somebody was driving a car, kissing,…
kenorb
  • 10,423
  • 3
  • 43
  • 91
4
votes
1 answer

What topologies support recognition of action sequences?

The ability to recognize an object with particular identifying features from single or multiple camera shoots with the temporal dimension digitized as frames has been shown. The proof is that the movie industry does face replacement to reduce…
4
votes
1 answer

Applications of CNN for detecting crime from video surveillance cameras

Inspired by this discussion about recognizing human actions, I have found the Fall-Detection project which detects humans falling on the ground from a CCTV camera feed, and which can consider alerting the hospital authorities. My question is, are…
3
votes
1 answer

How should continuous action/gesture recognition be performed differently than isolated action recognition

I am going to train a deep learning model to classify hand gestures in video. Since the person will be taking up nearly the entire width/height of the video and I will be classifying what hand gesture he or she is doing, I don't need to identify the…
3
votes
1 answer

How can I do video classification while taking into account the temporal dependencies of the frames?

I need to solve a video classification problem. While looking for solutions, I only found solutions that transform this problem into a series of simpler image classification tasks. However, this method has a downside: we ignore the temporal…
2
votes
1 answer

What type of neural network do you need if you want to detect an action or dynamic pattern instead of a static pattern?

Let's say that you want to detect if a man is running, walking, or dancing instead of just detecting a man still. What type of neural networks will you use for this purpose?
2
votes
1 answer

Why do action recognition algorithms perform better on ucf101dataset than HMDB51 dataset?

If we look at state of the art accuracy on the UCF101 data set, it is around 93% whereas for the HMDB51 data set it is around 66%. I looked at both the data sets and both contain videos of similar lengths. I was wondering if anyone could give an…
2
votes
2 answers

Can PDDL be utilized for action recognition?

The Planning Domain Definition Language (PDDL) is known for its capabilities of symbolic planning in the state space. A solver will find a sequence of steps to bring the system from a start state to the goal state. A common example of this is the…
user11571
1
vote
0 answers

Is there a way, while training (with contrastive learning) the embedding network, to find the test accuracy?

I aim to do action recognition in videos on a private dataset. To compare with the existing state-of-the-art implementations, other guys published their code on Github, like the one here (for the paper Self-supervised Video Representation Learning…
1
vote
0 answers

What are the pros and cons of 3D CNN and 2D CNN combined with optical flow for action recognition?

For action recognition or similar tasks, one can either use 3D CNN or combine 2D CNN with optical flow. See this paper for details. Can someone tell the pros/cons of each, in terms of accuracy, cost such as computation and memory requirement, etc.?…
1
vote
1 answer

What is "temporal depth"?

I need some explanation about the following paragraph (page 3) from the paper A Novel Approach for Robust Multi Human Action Detection and Recognition based on 3-Dimentional Convolutional Neural Networks. We introduce a 3D convolution neural…
0
votes
1 answer

How to improve the performance when no shuffling of dataloader is needed?

I'm currently doing some researches on video recognition. What I'm trying to do is like this paper. The idea is that: for processing a specific input video clip (shape: [T, C, H, W]), it needs features of the video clip from last timestamp, where we…
0
votes
1 answer

Can I flip a video to generate more data for action recognition?

There are 8 distinct action classes and around 50+ videos per class. I was wondering if flipping videos from the training set can be a good option to generate additional data. Is it?