1

I am interested in creating a neural network-based engine for chess. It uses a $8 \times 8 \times 73$ output space for each possible move as proposed in the Alpha Zero paper: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

However, when running the network, the first selected move is invalid. How should we deal with this? Basically, I see two options.

  1. Pick the next highest outputted move, until it is a valid move. In this case, the network might automatically over time not put illegal moves on top.
  2. Process the game as a loss for the player who picked the illegal move. This might have the disadvantage that the network might be 'stuck' on only a few legal moves.

What is the preferred solution to this particular problem?

nbro
  • 39,006
  • 12
  • 98
  • 176
whits
  • 11
  • 1

1 Answers1

0

You should have a method to generate a possible moves output based on the board state. Use this as a mask before normalization in the policy head.

nbro
  • 39,006
  • 12
  • 98
  • 176
mshlis
  • 2,349
  • 7
  • 23