0

I'm trying to implement a MCTS-based AI for a trick-taking card game.

The game : (Belote)

  • The play consists of 8 tricks.
  • A trick consists in each of the 4 players play successively 1 (legal) card from their hand (starting with 8 cards).
  • A player can only play certain cards from his hand (depends on the current state of the trick). Thus, the other players can infer information of some other players hand based on the current and past tricks.

My implementation so far:

The node:

The current state of the round (previous tricks, current trick, the play order...) + the position of the "smart" player.

Expansion function:

I can think of 2 possibilities :

  • a card is played by a player
  • a whole new trick is played

Other tools that I implemented:

  • A method that gives the legal cards of a hand (based on the current trick).
  • A method that gives the smart player all the possible cards of another player (based on the current and previous tricks).
  • A method that generates every possible next (legal) trick (based on the 2 previous methods).

Problem :

If I want to apply MCTS, I need to ensure that the succession of the simulated remaining tricks (or played cards) until the end of the round (ie the path in the tree) will not be a dead-end before reaching the final (32th) card. Indeed, a player could run out of legal cards during the simulation...

How can I deal with that constraint in an MCTS context ?

Is MCTS even a viable algorithm for what I want to achieve ?

I could of course precompute the whole possible tree ahead of time, but his size would be huge, and the interest of the MCTS is kind of lost.

Betcha
  • 1
  • How do the game rules deal with situations where there are no legal cards left for a player? – Todd Sewell Nov 23 '22 at 15:08
  • There is always at least one legal card in every hand. The rules only specify a "priority order" of cards in a hand with respect to the current trick. "If you don't have these cards in your hand, then you should play these, if you still don't have these cards, then you have to play these... etc.". The legal cards in a hand depend on the current trick and the whole hand itself. – Betcha Nov 23 '22 at 19:58
  • This is in real life. Now, in my MCTS implementation it's different because I make the opponents play cards that they *possibly* have, but the smart player doesn't know their actual hands of course. – Betcha Nov 23 '22 at 20:09
  • Ah, I see. Could you randomly sample one possible set of cards at the start of the rollout and then keep taking cards from that set until the end of the rollout? More generally with hidden information everything becomes a lot more complicated game-theoretically, with things like bluffing and trading off good moves vs hiding information. Unfortunately I don't think there are any simple extensions to MCTS to deal with this (yet). – Todd Sewell Nov 24 '22 at 11:17
  • If I understand correctly, you suggest to distribute the 3 other players the 24 remaining cards (32 - 8 known cards from the smart player's hand) and rollout from this node. And repeat N times with different initial distributions. I've tried that but unfortunately it did not work very well. I suppose that it because there is $\frac{(3*8)!}{{8!}^3}$ ~ 9 billions possible initial distributions. And even a 1 card change can have a great impact. If not with MCTS, do you see any possible way to achieve that ? Reinforcement learning ? Thanks. – Betcha Nov 26 '22 at 09:20
  • Yeah I'm not too surprised it doesn't work that well, naive MCTS seems a bit hopeless it this case. For hidden information games two recent approaches (indeed using reinforcement learning) are [Player Of Games](https://arxiv.org/abs/2112.03178) and [ReBeL](https://arxiv.org/abs/2007.13544), so consider looking into those or some of the related work they cite. – Todd Sewell Nov 26 '22 at 18:13

0 Answers0