5

I have to analyse sequences of actions that look more or less like this JSON blob. The question I'm trying to answer is whether there are recurring (sub)patterns that different users adopt when asked to perform a certain specific task -- in this case, the task is to build a mathematical formula using this editor. In particular I'd like to know if there are multiple significantly different ways in which people build the same expression.

I thought of creating a Markov model, but that would only give me the most likely sequence of actions of length N. An obvious alternative would be to build trees and count how many times a certain path occurs in the dataset. However, the nature of the expression-building process means that the sequences can be polluted by many confounding, non-significant actions (such as streaks of UNDO-REDO, deleting symbols, and the likes).

I might go the "longest common subsequence" route, but I'm not sure that would tell me if there are "significantly different" ways of building the same expression (in quotes because, for now, I don't have a rigorous definition of "significantly different", but, for example, one way would be to drag and drop-in-place all the symbols in the correct order, and another way would be to drag all the symbols onto the canvas, and then place them in the correct spots).

I thought this might be a nice challenge for some AI algorithm, but I'm quite a noob at that, so I'm open to suggestions.

Morpheu5
  • 101
  • 4
  • 1
    Thanks for your comment. To answer your questions: (1) Yes, exact matches would be ideal, although I am prepared to drop fields if this improves pattern detection. (2) Ideally exact, but some sequences of actions could lead to the same end result, so, as long as I can control for that / specify what actions can be swapped, I am willing to accept fuzzy matching. – Morpheu5 Sep 22 '17 at 09:36

1 Answers1

2

This task falls within the overlapping fields of information extraction and pattern mining. Information extraction involves automatically extracting instances of specified relations from data. While pattern mining involves using data mining algorithms to discover interesting, unexpected and useful patterns between data in databases (Philippe F).

On your question you have stated that you have experimented with markov models with poor results. A better approach if you prefer working with markov models would be to use hierarchical markov models. Hierarchical markov models have multiple 'levels' of states which can describe input sequences at different levels of granularity. Hierarchical markov models are good at categorizing human behavior at various levels of abstraction i.e. a persons location in a room can be further interpreted to determine more complex information such as what activity the person is performing.

However my recommendation is that you implement random forest classifiers for this problem. Random forests provide excellent classification accuracy with a relatively simple implementation. Additionally random forests provide the ability to inspect trees for tweaking parameters to improve accuracy. You can also use cross validation to evaluate your model and calculate its accuracy. Consider using Python's Scikit-learn library's implementation of random forest classifier for this analysis.

In your json code action package you have declared metrics such as DRAG_START, TRASH_SYMBOL, OPEN and CLOSE. My suggestion is that for your model to be accurate you also need to declare lower level actions such as: time between clicks, change in the direction of mouse motion, screen region hover count, task completion time and time between a click and a succeeding mouse movement.

For further reference I recommend that you look at the papers below which I found useful and relevant to your question.

Hierarchical Hidden Markov Models for Information Extraction https://www.biostat.wisc.edu/~craven/papers/ijcai03.pdf

Detecting Abnormal User Behavior Through Pattern-mining Input Device Analytics https://www.ignacioxd.com/files/bib/Dominguez2015-Concentration.pdf

Hierarchical Hidden Markov Model in Detecting Activities of Daily Living in Wearable Videos for Studies of Dementia https://arxiv.org/ftp/arxiv/papers/1111/1111.1817.pdf

Seth Simba
  • 1,176
  • 1
  • 11
  • 29