Firstly, before we commence I will recommend that you refer to similar questions on the network i.e. https://stackoverflow.com/questions/6499880/ios-gesture-recognition-utilizing-accelerometer-and-gyroscope and https://stackoverflow.com/questions/6368618/store-orientation-to-an-array-and-compare
Your problem can be divided into three parts.
How to gather sensor data.
How to use the gathered data to train a model
How to use the trained model to make a prediction.
A modern smartphone contains around six sensors packed into one device. To implement your application I recommend that you use raw sensor data from either the gyroscope or the accelerometer.
On the android platform, you can access these sensors and acquire the raw sensor data by using the Android sensor framework. The Android sensor framework is part of the android.hardware package.
To capture the raw sensor data, take regular samples (> 20 Hz) and save the maximum values of x, y and z in an array each (to recognize in all 3 planes). If you want the gesture to be 5 seconds long, keep 100 samples (at 20 Hz). Then analyze if any of the three arrays has values which change sinusoidally. If it does, you've got a gesture.
You could store these values into an array, if the user is in 'record mode'. And when the user tries to replicate that movement, the model could predict the replicated movement array from the recorded one's. The thing is, how can you compare the arrays in a smart way? (Randy M 2011)
This leads us to the next step which is applying ML.
To train your model, you can choose to either train the model on the cloud or train it locally on the device. In most cases the problem of training a model on a mobile device is about computational limitations. Machine Learning algorithms running on a mobile device need to be carefully designed and implemented since most mobile devices have weak processors and small RAM's.
Its quite a challenge to squeeze a large neural network (NN) into the small RAM's that smartphones's have since NN's require that the model is fully loaded into memory. In many cases it is advisable to slim the model down and set some weights near zero to zero. (Chomba B 2017)
Incase you decide to utilize the cloud. Your mobile app is required to simply send an HTTPS request to a cloud web service along with the required raw sensor data, and within seconds the service replies with the prediction results. Though there are several cloud services i.e. Microsoft Azure Cognitive Services, Clarifai and Google Cloud Cognition that you can leverage to host the server side of your application, personally I recommend that you consider using reality.ai which is specifically an AI tool for engineers working with signals and sensors.
The next step will be to select an appropriate algorithm to be used in classifying the gesture. Here you can employ either Logistic Regression, Support Vector Machines, Random Forest or Extremely Randomized Trees depending on your app's use case. In order to train your model we then provide the learning algorithm with 'labeled examples'. The ML algorithm then extracts the features and constructs a mathematical model that can accurately describe gestures i.e. roll, pan from raw sensor data.