MCTS: Units away from the action

Question

I'm trying to implement Monte Carlo Tree Search for (a simplified version of) the boardgame Commands and Colors -- I'm setting up a scenario where the AI side has overwhelming force: 6 units vs 3 units played by the human.

I would hope that MCTS moves the 6 units in for the kill; instead what happens is that some units attack, some move sideways, and some retreat.

I suspect that the units in the front are already strong enough to make victory likely, so that the unit in the back does not see a difference in its moving closer to the action, and chooses to move away from it. I suspect that when evaluating the value of the moves of the far away unit, the "noise" caused by the action of the units in the front, whose actions make the value of the position swing heavily, make it very difficult to evaluate the smaller contribution to the quality of the position made by moving the unit in the back.

This is sad! A human player would move all units towards the enemy, bc if the front units get damaged, they will be moved away from the front and replaced by second-line units. Having units move randomly away from the action makes no sense.

How do I fix it?

-- Edit:

The source code is here https://github.com/xpmatteo/auto-cca The case that does not work as expected can be observed with:

make server
(open another terminal)
make open
click "End Phase" twice

The lone brown (Carthaginian) unit should close in against the gray (Roman) ones, but it doesnt.

Do you have one centralized logic for all the units, or is it a logic per piece? The way you describe it, each piece has its own agent? — Oren, Jun 28 '23 at 12:39
Unit logic is centralized -- at every step I generate the set of possible moves, which are the same for all units: they can move one hex each. Once a unit has moved, it's "spent" for the current turn and cannot move again, so the set of available moves is reduced by one — xpmatteo, Jun 29 '23 at 07:40

Oren · Answer 1 · 2023-06-28T15:44:50.253

1

One option is that the RL is not there yet, you may need to continue with the training, or maybe add to the win/lose reward some tiny "hints" what is a good move (0.001 when you eat an enemy, -0.001 when your unit is being eliminated, etc..).

I do assume that all the moves of your units are taken in one "action" of your army, and that you evaluate the reward for all the units as one single "army".

edited Jun 28 '23 at 15:44

answered Jun 28 '23 at 12:52

Oren

141
3

The avaiable moves are modelled as individual unit moves, so during a turn, the AI moves all its units one by one, then it executes combats one by one, then the turn is over. The playout ends with an evaluation of the position which is based on how much damage the AI inflicts the opponent minus how much damage it receives from the opponent – xpmatteo Jun 29 '23 at 07:43
If you have 6 units with about 4-5 available moves each, there is a lot of move combinations the AI needs to choose – xpmatteo Jun 29 '23 at 07:45
"the RL is not there yet" <-- I tried increasing the number of iterations, but it does not seem to find the solution I want. After a while, a "winner" move keeps getting selected even though it's not the move that I want, and increasing iterations does not help – xpmatteo Jun 29 '23 at 07:48
Can you please share code. It will be a bit clearer. Should be interesting for me and for others. – Oren Jun 29 '23 at 12:41
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 29 '23 at 13:49
1

Shared the code -- have fun! – xpmatteo Jun 29 '23 at 20:34
It is very nice and I recommend others to play with your code. Really nice. Did I get it right that the Carthaginian are the ones with extra power and you refer to them as the AI? It seems that the Carthaginian always win, so why should they learn to win this way rather than another way? If it is important to you how fast they will win, then add some penalty on the time maybe? – Oren Jun 30 '23 at 19:22

Oren · Answer 2 · 2023-06-30T19:32:11.703

1

It is very nice and I recommend others to play with your code. Really nice. Did I get it right that the Carthaginian are the ones with extra power and you refer to them as the AI? It seems that the Carthaginian always win, so why should they learn to win this way rather than another way? If it is important to you how fast they will win, then add some penalty on the time maybe?

edited Jun 30 '23 at 19:32

answered Jun 30 '23 at 19:25

Oren

141
3

Thanks Oren. The problem is, if I start with equal forces, I always beat the AI because it does not play very intelligently – xpmatteo Jun 30 '23 at 20:06

MCTS: Units away from the action

2 Answers2