13

I am writing a simple toy game with the intent of training a deep neural network on top of it. The games rules are roughly the following:

  • The game has a board made up of hexagonal cells.
  • Both players have the same collection of pieces that they can choose to position freely on the board.
  • Placing different types of pieces award points (or decrease opponent's points) depending on their position and configuration wrt one another.
  • Whoever has more points win.

There are additional rules (about turns, number and types of pieces, etc...) but they are not important in the context of this question. I want to devise a deep neural network that can iteratively learn by playing against itself. My questions are about representation of input and output. In particular:

  • Since pattern of pieces matter, I was thinking to have at least some convolutional layers. The board can be of various size but in principle very small (6x10 on my tests, to be expanded by few cells). Does it make sense? What kind of pooling can I use?
  • How to represent both sides? In this paper about go, authors use two input matrices, one for white stones and one for black stones. Can it work in this case too? But remember I have different types of pieces, say A, B, C and D. Should I use 2x4 input matrices? It seem very sparse and of little efficiency to me. I fear it will be way too sparse for the convolutional layers to work.
  • I thought that the output could be a distribution of probabilities over the matrix representing board positions, plus a separate array of probabilities indicating what piece to play. However, I also need to represent the ability to pass the turn, which is very important. How can I do it without diluting its significance among other probabilities?
  • And most importantly, do I enforce winning moves only or losing moves too? Enforcing winning moves is easy because I just set desired probabilities to 1. However when losing, what can I do? Set that move probability to 0 and all the others to the same value? Also, does it make sense to enforce moves by the final score difference, even though this would go against the meaning of the outputs, which are roughly probabilities?

Also, I developed the game engine in node.js thinking to use Synaptic as framework, but I am not sure it can work with convolutional networks (I doubt there's a way to fix the weights associated to local perceptive fields). Any advice on other libraries that are compatible with node?

Totem
  • 381
  • 2
  • 6

2 Answers2

1
  • To represent the pieces, you should be able to use a single input matrix. Just designate an integer number for the different types of pieces. White stones can be positive integers and black stones can be negative.

  • You can use sigmoid for board position confidence and linear activation for piece identifier. pass would be another sigmoid output. I don't think you'll have to worry about pass being diluted. Since it is such a valuable action, the score will depend a lot on the pass output and it will have a large gradient. If you need to select the pass action with high frequency for reinforcement learning purposes, then just attribute a higher probability to the pass action in your random choice function.

  • The final score difference has a large impact on the desirability of the moves. A large score difference should result in a large impact on the function. Therefore you might want to include the magnitude of score difference in your loss function.

This is the type of job that Deep Q Learning does. Perhaps you'll want to look into that too.

Default picture
  • 212
  • 1
  • 4
1

You don't need conv layers, since you don't feed a picture as an input (see below). Alternatively, you can try using a picture of the board (with different pieces having different shapes). This can work too. Then I would go for 2 conv layers, stride 1, kernel size equal to half a piece size. I would try it with a single max pooling.

Unlike the other answer i would suggest using a 3d tensor as an input, with the number of channels equal to different pieces. The other two dimensions equal would correspond to number of cells on the board. Various transformation in you NN will not be able to distinguish between multiple integers very well. That's why it is better to have a one-hot encoding of the pieces' types.

I would use only a vector with n+1 components for output: n for all possible moves, and 1 for the pass. It would encode the expected reward for each move, not the probability.

Not sure what you mean by enforcing moves. But when you going to train it with something like Q-learning it would make sense making a completely random move every once and a while with a certain probability (say 10% of the times). Lookup https://en.wikipedia.org/wiki/Reinforcement_learning

hellmean
  • 130
  • 6