1

I have coded a neural-network-based RL tic-tac-toe agent. It trains well enough to win against random agents almost all the time, the larger the board (the code allows training on NxN boards and with winning line longer than 3), the closer to 100%.

However, this agent is still stupid enough for me to win easily, even after ~= 100000 games self-play with decaying exploration probability on board like 5x5 with winning length 4. I've coded another algorithmic agent, which makes no obvious blunders - and the neural-network-based agent loses to it with a fatal rate event playing for X.

I don't use memory replay (on purpose); I only do online learning. This is for the reason that memory replay for small boards (up to 5x5) is just like overfitting the dataset, and I only want to prove online learning works without many domain-specific tricks can do a perfect agent.

Is there any work/repos which compare deep learning against a somewhat perfect agent (which can easily be developed for small boards and simple rules)?

Robin van Hoorn
  • 1,810
  • 7
  • 32
Emil
  • 111
  • 4

0 Answers0