4

I’ve done my research and could not find answer anywhere else. My apologies in advance if same problem is answered in different terms on stack-overflow.

I am trying to solve poker tournament winner prediction problem. I’ve millions of historical records in this format:

  • Players ==> Winner
  • P1,P2,P4,P8 ==> P2
  • P4,P7,P6 ==> P4
  • P6,P3,P2,P1 ==> P1

What are some of the most suitable algorithms to predict winner from set of players.

So far I have tried decision trees, XGboost without much of a success.

DukeZhou
  • 6,237
  • 5
  • 25
  • 53
  • Welcome to SE:AI! *I notice you are new to Stack Exchange and also asked this question on Data Science. Generally that is frowned upon, but, as there is some overlap between our two Stacks, I only ask that you make sure to "prune" the question from one of the Stacks if it doesn't receive an answer.* – DukeZhou Aug 20 '18 at 18:26
  • 1
    This question is answered on [Data Science](https://datascience.stackexchange.com/questions/37202/poker-tournament-winner-prediction/37217#37217) – Tejas Patel Aug 24 '18 at 02:25

1 Answers1

1

This looks like it maps very nicely onto Association Mining. In association mining, you are trying to find items from a discrete set that often co-occur in transactions. For instance, you might want to find the items that most commonly appear in an online shopping cart together.

In your case, the problem amounts to:

  1. Split up the data into sub-problems by who won.
  2. Perform association mining on the sets of players in each sub-problem, using, e.g. the apriori algorithm.

The resulting rules will then have associated confidence factors. When you want to predict who will win, you can take the winner suggested by the rule with the greatest confidence.


The other approach suggested by your problem is to model it as classification. Here you are trying to assign labels (who won) to input vectors (who played). The process would look like:

  1. Map the sets of who played into binary features. So the subset P1,P2, P4,P8 might be represented with {1,1,0,1,0,0,0,1} if there are only 8 players. Map the winning player onto a numeric class (e.g. the numbers 1-8).
  2. Run any classification algorithm to create a model that predicts the class from the binary features. A decision tree learner might be a interesting starting point if you want to understand which factors are important to the model. There are many other techniques however.

You can use the model you train to make predictions about future games with the same set of players.

John Doucette
  • 9,147
  • 1
  • 17
  • 52
  • 1
    John, thank you for directions. Per my understanding apriori algorithm only uses X as an input, how do I factor in Y (target or winner )? Am I missing something? – Tejas Patel Aug 20 '18 at 20:05
  • 1
    As mentioned in step 1, you want to split the data into separate problem instances for every target (Y). Then find players that co-occur in the input sets (X) for that target. Filter out the obvious rule (that Y always appears in X), and you'll end up with player combinations that Y often defeats, and a confidence estimate for each. – John Doucette Aug 20 '18 at 20:17
  • 1
    Prof. Doucette, unfortunately there are more than 6000 payer profiles. Is there a deep learning alternative to manual formation of separate problem instances? – Tejas Patel Aug 20 '18 at 21:17
  • 1
    No need for deep learning here. If you wanted to go that route, then you could phrase it as a classification task, like in the second half of my answer, and then try a deep learning algorithm. You'll probably be happier with the decision tree though, unless you have a good idea of what you're doing. – John Doucette Aug 21 '18 at 01:21
  • John, this [question appears to have an accepted answer on Data Science](https://datascience.stackexchange.com/questions/37202/poker-tournament-winner-prediction/37217#37217), but I'm inclined to leave it open on AI because some effort was put into this answer, which provides useful links and concepts involving the context. Let me know if you have any thoughts on it, either way. – DukeZhou Aug 24 '18 at 19:07
  • @DukeZhou Yeah, it's definitely got overlap with Data Science, but I agree that the fields have enough overlap it could make sense to keep this open. Their answer uses an ELO system, whereas mine uses more general purpose techniques. – John Doucette Aug 26 '18 at 00:37
  • Split by winner implies a loss of information. By example: A,B->A; B,C->B; A,B,C->? – pasaba por aqui Aug 28 '18 at 09:03
  • @pasabaporaqui If you don't have any training data with an outcome for A,B,C, then no information was lost. If you do have an outcome, then the information is used by the sub-problem that predicts the winner on the basis of that information. Either way, I don't see how information is lost. – John Doucette Aug 28 '18 at 10:58
  • The answer to the example is A, assuming "win" is a transitive property. Using your proposal, no answer is reached – pasaba por aqui Aug 28 '18 at 11:02
  • @pasabaporaqui You've assumed that poker is transitive, but there's no way of processing this data that will reveal that fact if the players never play together (and in fact, we don't _know_ that A beats C, or even that the inference is reasonable, without knowing the players). The ELO approach suggested by the Data Science answer embeds this assumption by ranking the players in a 1-d space. This will probably work for expert play, but I suspect poker at a low level is intransitive, and might depend on factors like who knows the tells of other players. – John Doucette Aug 28 '18 at 11:08
  • Thus, your answer is "you can only decide a case already included in trainibg set"? Thus, you do not need split. I think interesting you clarify your answer with an example. – pasaba por aqui Aug 28 '18 at 11:13
  • @pasabaporaqui I'm not sure I understand you. We can make a guess on out of sample data, as with any other method (even ELO). The bias in this technique's guess will be that a player who beats _both_ other players separately will get a strong vote of confidence, and a player who beats _either_ of the others will receive moderate win confidence. This is also potentially untrue (maybe, in context, C knows the tell of A, and pumps money out of them to win). We're dancing around No Free Lunch theorem here: the choice of correct bias will depend on your problem and players. – John Doucette Aug 28 '18 at 11:17