I was facing a problem I mentioned in a previous question but after a while, I realize that maybe the problem in the dataset not in the learning rate.
I build the dataset from white positions only i.e the boards when it's white's turn.
Each data set consists of one game.
First, I tried to let the agent plays a game then learn from it immediately, but that did not work, and the agent converges to one state of playing (he only lose or win against itself, or draw in a stupid way).
Second, I tried to let the agent plays about 1000 games against its self then train on each game separately, but that also does not work.
Note: the first and second approach describes one iteration of the learning process, I let the agent repeat them so in total it trained on about 50000 games.
Is my approach wrong? or my dataset must be built in another way? maybe train the agent on several games at once?
My project is here if someone needs to take a closer look at it: CheckerBot