Questions tagged [tic-tac-toe]

For questions about the tic-tac-toe (aka noughts and crosses) in the context of artificial intelligence.

15 questions
9
votes
3 answers

How should I represent the input to a neural network for the games of tic-tac-toe, checkers or chess?

I've been reading a lot about TD-Gammon recently as I'm exploring options for AI in a video game I'm making. The video game is a turn-based positional sort of game, i.e. a "units", or game piece's, position will greatly impact its usefulness in that…
6
votes
2 answers

Why does self-playing tic-tac-toe not become perfect?

I trained a DQN that learns tic-tac-toe by playing against itself with a reward of -1/0/+1 for a loss/draw/win. Every 500 episodes, I test the progress by letting it play some episodes (also 500) against a random player. As shown in the picture…
6
votes
1 answer

What are good learning strategies for Deep Q-Network with opponents?

I am trying to find out what are some good learning strategies for Deep Q-Network with opponents. Let's consider the well-known game Tic-Tac-Toe as an example: How should an opponent be implemented to get good and fast improvements? Is it better to…
4
votes
1 answer

How do we find the length (depth) of the game tic-tac-toe in adversarial search?

When we perform the tic-tac-toe game using adversarial search, I know how to make a tree. Is there a way to find the depth of the tree, and which level is the last level?
4
votes
1 answer

Why is tic-tac-toe considered a non-deterministic environment?

I have been reading about deterministic and stochastic environments, when I came up with an article that states that tic-tac-toe is a non-deterministic environment. But why is that? An action will lead to a known state of the game and an agent has…
3
votes
2 answers

How can both agents know the terminal reward in self-play reinforcement learning?

There seems to be a major difference in how the terminal reward is received/handled in self-play RL vs "normal" RL, which confuses me. I implemented TicTacToe the normal way, where a single agent plays against an environment that manages the state…
3
votes
1 answer

Given these two reward functions, what can we say about the optimal Q-values, in self-play tic-tac-toe?

This corresponds to Exercise 1.1 of Sutton & Barto's book (2nd edition), and a discussion followed from this answer. Consider the following two reward functions Win = +1, Draw = 0, Loss = -1 Win = +1, Draw or Loss = 0 Can we say something about…
2
votes
3 answers

What is the optimal score for Tic Tac Toe for a reinforcement learning agent against a random opponent?

I guess this problem is encountered by everyone trying to solve Tic Tac Toe with various flavors of reinforcement learning. The answer is not "always win" because the random opponent may sometimes be able to draw the game. So it is slightly less…
2
votes
1 answer

Why can I still easily beat my Q-learning agent that was trained against another Q-learning agent to play tic tac toe?

I implemented the Q-learning algorithm to play tic-tac-toe. The AI plays against the same algorithm, but they don't share the same Q matrix. After 200,000 games, I still beat the AI very easily and it's rather dumb. My selection is made by epsilon…
2
votes
1 answer

Why isn't my Q-Learning agent able to play tic-tac-toe?

I tried to build a Q-learning agent which you can play tic tac toe against after training. Unfortunately, the agent performs pretty poorly. He tries to win but does not try to make me 'not winning' which ends up in me beating up the agent no matter…
1
vote
0 answers

How do I improve my RL tic-tac-toe agent?

I have coded a neural-network-based RL tic-tac-toe agent. It trains well enough to win against random agents almost all the time, the larger the board (the code allows training on NxN boards and with winning line longer than 3), the closer to…
Emil
  • 111
  • 4
1
vote
1 answer

How are rewards calculated for episodic tasks like playing chess or tic-tac-toe?

I am new to Reinforcement Learning and trying to understand the concept of reaping rewards during episodic tasks. I think in games like tic-tac-toe, rewards will be in terms of a win or lose. But does that mean we need to finish the entire game to…
1
vote
2 answers

Non-Neural Network algorithms for large state space in zero sum games

I was reading online that tic-tac-toe has a state space of $3^9 = 19,683$. From my basic understanding, this sounds too large to use tabular Q-learning, as the Q table would be huge. Is this correct? If that is the case, can you suggest other…
1
vote
1 answer

How to make the RL player a perfect/expert (tic-tac-toe/chess) player?

I asked a question related to tic-tac-toe playing in RL. From the answer, it seems to me a lot is dependent on the opponent (rightly so, if we write down the expectation equations). My questions are (in the context of tic-tac-toe or chess): How to…
1
vote
1 answer

In tic-tac-toe, what is the effect of the starting state on the state and action value function?

I am simulating a Tic-Tac-Toe game with a human opponent. The way the RL trains is through policy/value iterations for a fixed number of iterations all specified by the user. Now, whether the human player has turn 1 or turn 2 will decide the…