3

From the MuZero paper (Appendix E, page 13):

In chess, 8 planes are used to encode the action. The first one-hot plane encodes which position the piece was moved from. The next two planes encode which position the piece was moved to: a one-hot plane to encode the target position, if on the board, and a second binary plane to indicate whether the target was valid (on the board) or not. This is necessary because for simplicity our policy action space enumerates a superset of all possible actions, not all of which are legal, and we use the same action space for policy prediction and to encode the dynamics function input. The remaining five binary planes are used to indicate the type of promotion, if any (queen, knight, bishop, rook, none).

Is the second binary plane all zeros or all ones? Or, something else? How is it known if the move is off the board? For my game, I know if it is a legal move on the board, but do not know if the move is off the board.

nbro
  • 39,006
  • 12
  • 98
  • 176
MuZeroFm
  • 31
  • 2

1 Answers1

2

Is the second binary plane all zeros or all ones? Or, something else? How is it known if the move is off the board? For my game, I know if it is a legal move on the board, but do not know if the move is off the board.

The second binary plane is one-hot by definition, there is a single one and everything else is zero. If this definition is not met, it's no longer "one-hot".

The paper doesn't state how exactly to implement "off the board" implementation. Research paper wouldn't go into coding level. However, detecting "off the board" is not a challenging task.

https://webcache.googleusercontent.com/search?q=cache:djj-G4T_PwgJ:https://craftychess.com/hyatt/boardrep.html+&cd=2&hl=en&ct=clnk&gl=au

The next step in board representation evolution is to enclose the board inside a larger array, so that illegal squares are "off" the edge and are easily detectable.

A possibility is add borders to your board. Crafty did that. Extend the board to 10x10. Not 9x9 because you need to deal with knight jumping.

Exactly how you should be doing it is implementation defined. We don't know what Google did because AlphaZero is not open source. I'm here just giving you an example.

ABCD
  • 1,411
  • 1
  • 8
  • 14