1

I have implemented an AI agent to play checkers based on the design written in the first chapter of Machine Learning, Tom Mitchell, McGraw Hill, 1997.

We train the agent by letting it plays against its self.

I wrote the prediction to get how good a board is for white, so when the white plays he must choose the next board with the maximum value, and when black plays it must choose the next board with the minimum value.

Also, I let the agent explore other states by making it choose a random board among the valid next boards, I let that probability to be equal to $0.1$.

The final boards will have training values:

100 if this final board is win for white.

-100 if this final board is lose for white.

0 if this final board is draw.

The intermediate boards will have a training value equal to the prediction of the next board where it is white turn.

The model is based on a linear combination of some features (see the book for full description).

I start by initializing the parameters of the model to random values.

When I train the agent, he lost against himself always or draw in a stupid way, but the error converges to zero.

I was thinking that maybe we should let the learning rate smaller (like 1e-5), and when I do that the agent learns in a better way.

I think that this happened because of the credit assignment problem, because a good move may appear in a loose game, therefore, considered a loose move, so white will never choose it when he plays, but when we let the learning rate to be very small that existence of a good move in a losing game will change its value by a very little amount, and that good move should appear more in win games so its value converges to the right value.

Is my reasoning correct? and if not so what is happened?

Kais Hasan
  • 361
  • 2
  • 10
  • Please, put your main **specific** question in the title. "AI agent playing checkers" is not a question and it's very general. If you have more than one specific question, split this post into multiple ones (one for each of the questions that you have). – nbro Dec 31 '20 at 18:04
  • @nbro sorry for that, I edited the question, I hope it is more clear now. – Kais Hasan Dec 31 '20 at 18:29
  • There are a couple of things that might be off here. Number one is, as you mentioned the learning rate. Number two is the exploration policy (i.e. taking a random action only 10% of times), you might want to try and have a higher chance of random move for a while and then progressively lower it. Number 3 is that the agent is playing against itself, but when he's black his aim is to lose. Is the agent fed the information of who he is (black or white) as part of the observation? This would be essential – Federico Malerba Jan 02 '21 at 21:12
  • @FedericoMalerba I tried the second point you mentioned but that does not solve the problem. I do not understand the third point, the prediction based on the evaluation of the board for white, so black need to minimize the board goodness for white, so he choose the minimum. For the training, I only take the boards when it's white's turn so no need for the colour. I take this from the book I mentioned at the top of the question. – Kais Hasan Jan 03 '21 at 09:45

0 Answers0