I was recently introduced to this simple board game called "Battle Sheep". In this game, two to four players try to acquire as many hex tiles from a hex grid as possible.
You can find the complete rules here.
I decided to program a game engine for this game. You can find it here. I also created a SFML-based graphical interface for the game, which can be found from the same repo - here.
The reason I created the game engine in the first place was a challenge to myself - I wanted to create an AI which would have superhuman skills in this game, since I thought that the game is pretty simple and it would be easy for a neural network to master it. I have been studying the work about deep reinforcement learning networks, especially Leela chess zero (Alphazero) and AlphaGo. I thought that a DQN should be possible to master the BattleSheep game, but I have stumbled across a problem:
Let's consider a game with 2 players. Since the game has two phases (map creation and the gameplay itself), the action space of the game is unknown at the start because the two players can place their pasture boards as they wish as long as it follows the rules (see the link above). In my game engine, I have solved this problem by creating large-enough map so that any kind of piece placement combination in the map creation phase will fit inside of its bounds. During the map creation phase, the chosen hexes are set "active" (defined HEX_FREE
in my engine). "inactive" hexes cannot be used, or could be considered "out of bounds" during gameplay (defined HEX_OOB
in my engine). For this code, see the constructor of my Map class.
Now the question itself: Deep Q-learning requires a fixed-size action space, which is hard to define in BattleSheep, since the map is different in every game. One possibility would be to create an action space that consists of the complete map (including these "inactive" hexes in my game engine), and define every possible move in that map. However, if we consider a map with for example 40 width and 40 height, the action space would be
40 x 40 x 6 x (40 x?) 16
40 x 40 - from which hex the player will start the move
6 - to which direction the player will move
40? - how far the pieces will move (is this required since you are always forced to move as far as possible?)
16 - how many pieces the player will move
Without the extra x40, the action space will yield 153600 different moves, and with the extra 40 (move length), the action space will have 6.1 million different possible moves. This is surely too large. How should I tackle this problem?