1

I'm implementing a connect-4 agent: using DQN, training by playing vs previous versions of the network.

However, many times the network learns that it's best to simply put 4 pieces in the same column, which is usually able to beat the previous (also naive) versions.

What could be the reason? how can it be dealt with? Thanks!

Ben Lahav
  • 13
  • 3

1 Answers1

1

There are many potential reasons why your AI might be learning that it's best to put $4$ pieces in the same column. One possibility is that AI is simply finding this strategy to be the most effective way to win the game. Another may be that the AI is overfitting to the training data and is not generalizing well to new situations i.e not exploring enough during training, which would prevent it from finding new and better strategies

To avoid a sub-optimal(naive solution), it is important to use a state representation that is informative enough to allow the agent to learn a good strategy. if state representation is not informative enough, then the agent will be not able to learn a good strategy from the data it has.

To combat this issue try to use feature engineering to make the state representation more informative, such as adding a binary feature for each column indicating whether or not that column is full. This would encourage the agent to try to spread its pieces out over the board more, rather than just dumping them all in the same column.

Another possibility could be that the AI is not exploring enough and is getting stuck in a local optimum so the approach would be to use a different reinforcement learning algorithm that is less likely to get stuck in local optima, such as Q-learning or SARSA. You could also try increasing the exploration rate during training, which would also help the agent escape from local optima

Faizy
  • 1,074
  • 1
  • 6
  • 30
  • 1
    ok I found out that I was overriding the model after it finished training lol. now it works! However I appreciate your answer! It clarified some terminology and have given some interesting general ideas. Thanks! – Ben Lahav Oct 19 '22 at 17:26