I am writing a simple toy game with the intent of training a deep neural network on top of it. The games rules are roughly the following:
- The game has a board made up of hexagonal cells.
- Both players have the same collection of pieces that they can choose to position freely on the board.
- Placing different types of pieces award points (or decrease opponent's points) depending on their position and configuration wrt one another.
- Whoever has more points win.
There are additional rules (about turns, number and types of pieces, etc...) but they are not important in the context of this question. I want to devise a deep neural network that can iteratively learn by playing against itself. My questions are about representation of input and output. In particular:
- Since pattern of pieces matter, I was thinking to have at least some convolutional layers. The board can be of various size but in principle very small (6x10 on my tests, to be expanded by few cells). Does it make sense? What kind of pooling can I use?
- How to represent both sides? In this paper about go, authors use two input matrices, one for white stones and one for black stones. Can it work in this case too? But remember I have different types of pieces, say A, B, C and D. Should I use 2x4 input matrices? It seem very sparse and of little efficiency to me. I fear it will be way too sparse for the convolutional layers to work.
- I thought that the output could be a distribution of probabilities over the matrix representing board positions, plus a separate array of probabilities indicating what piece to play. However, I also need to represent the ability to pass the turn, which is very important. How can I do it without diluting its significance among other probabilities?
- And most importantly, do I enforce winning moves only or losing moves too? Enforcing winning moves is easy because I just set desired probabilities to 1. However when losing, what can I do? Set that move probability to 0 and all the others to the same value? Also, does it make sense to enforce moves by the final score difference, even though this would go against the meaning of the outputs, which are roughly probabilities?
Also, I developed the game engine in node.js thinking to use Synaptic as framework, but I am not sure it can work with convolutional networks (I doubt there's a way to fix the weights associated to local perceptive fields). Any advice on other libraries that are compatible with node?