2

I have a steady hex-map and turn-based wargame featuring WWII carrier battles.

On a given turn, a player may choose to perform a large number of actions. Actions can be of many different types, and some actions may be performed independently of each other while others have dependencies. For example, a player may decide to move one or two naval units, then assign a mission to an air unit or not, then adjust some battle parameters or not, and then reorganize a naval task force or not.


Usually, boardgames allow players to perform only one action each turn (e.g. go or chess) or a few very similar actions (backgammon).

Here the player may select

  • Several actions
  • The actions are of different nature
  • Each action may have parameters that the player must set (e.g. strength, payload, destination)

How could I approach this problem with reinforcement learning? How would I specify a model or train it effectively to play such a game?

Here is a screenshot of the game.

enter image description here

Here's another.

enter image description here

nbro
  • 39,006
  • 12
  • 98
  • 176
  • This an old question, but could you please provide the name of this game? Moreover, how are you actually interacting with the game? Is there an available public API that the game developers provide so that users can program their own AI? Maybe this is a game you're developing. It seems that the game is this one: [https://apprecs.com/ios/961994680/carrier-battles-4-guadalcanal](https://apprecs.com/ios/961994680/carrier-battles-4-guadalcanal) or https://carrier-battles.com/. – nbro Jan 17 '21 at 15:15

2 Answers2

1

The master's thesis Action space representation in combinatorial multi-armed bandits (2015) seems to provide an answer to my question.

Several algorithms can be used

  • Naive Monte-Carlo Sampling (NMC)
  • Linear Side Information (LSI)
  • Monte Carlo Tree Search with Hierarchical Expansion (MCTS-HE)
  • MCTS with Dimensional Expansion

The idea is to divide and conquer.

Here's a screenshot of the HE algorithm from section 4.2.

enter image description here

nbro
  • 39,006
  • 12
  • 98
  • 176
0

One very popular RL algorithm that is capable of predicting multiple action outputs concurrently is Proximal Policy Optimization. In that algorithm, one or more, say $n$, tuples of outputs, $(\mu, \sigma)$, can be predicted at once (having $2*n$ output nodes), where each tuple is used to parameterize a Gaussian distribution from which a respective action value is sampled. The thus sampled action values are then applied in the simulation/game. By slightly modifying this procedure, this can of course also be applied to discrete action spaces equally well.

To get you started, multiple high-quality implementations of PPO are available for rapid prototyping, e.g. OpenAI baselines or stable-baselines.

Daniel B.
  • 805
  • 1
  • 4
  • 13