I'm quite new to machine learning (I followed the Coursera course of Andrew Ng and now starting deeplearning.ai courses).
I want to classify human actions real-time like:
- Left-arm bended
- Arm above shoulder
- ...
I first did some research for pre-trained models, but I didn't find any. Because I'm still quite new, I want to have advice about how I should solve this.
I thought maybe I need to create for every action enough pictures and from there on I can do image classification.
Or I use PoseNet from TensorFlow so that I have the pose estimation points. And from there on I create videos of a couple of seconds with every pose I want to track and I save the estimation points. From there on, I use a classification algorithm (neural network) to classify those points.
What is the most efficient option or are they both bad and is there a better way to do this?