4

I have been reading about deterministic and stochastic environments, when I came up with an article that states that tic-tac-toe is a non-deterministic environment.

But why is that?

An action will lead to a known state of the game and an agent has full knowledge of the board and of its enemies' past moves.

nbro
  • 39,006
  • 12
  • 98
  • 176
EEAH
  • 193
  • 1
  • 5
  • 2
    You say "when I came up with an article that states that TIC TAC TOE is a non-deterministic environment.". Can you please provide the link to the article? – nbro Aug 03 '20 at 02:24

1 Answers1

3

The game of TIC-TAC-TOE can be modelled as a non-deterministic Markov decision process (MDP) if, and only if:

  • The opponent is considered part of the environment. This is a reasonable approach when the goal is to solve playing against a specific opponent.

  • The opponent is using a stochastic policy. Stochastic policies are a generalisation that include deterministic policies as a special case, so this is a reasonable default assumption.

An action will lead to a known state of the game and an agent has full knowledge of the board and of Its enemies past moves.

Whilst this is true, the next state and reward as observed by an agent may not be due to the postion it plays in (with the exception being if it wins or draws on that move), but the position after the opponent plays.

It is also possible to frame TIC-TAC-TOE as a partially observed MDP (POMDP) if you consider the opponent to not have a fixed policy, but to be reacting to play so far, perhaps even learning from past games. In which case, the internal state of the opponent is the unknown part of the state. In standard game playing engines and in games of perfect information, this is resolved by assuming the opponent will make the best possible (or rational) move, which can be determined using a search process such as minimax. When there is imperfect information, such as in poker, it becomes much harder to allow for an opponent's action.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • Thank you. But from this answer, every multi agent environment can be viewed as a non deterministic because `the next state and reward as observed by an agent may not be due to the postion it plays in (with the exception being if it wins or draws on that move), but the position after the opponent plays`. Is there any example of a multi-agent system with a deterministic environment ? – EEAH Aug 03 '20 at 14:08
  • @EEAH: That is correct. It will depend on how you define and implement multi-agent environments. However, if the agent needs to account for the behaviour of the other agents then the result can be non-deterministic. You can reasonably say that the core of the environment is deterministic, in the same sense that TIC TAC TOE is a deterministic game, but the agents may often need to deal practically with non-deterministic and/or partially-observable features, regardless of whether you say that is due to the agents separately, or if you consider other agents to be part of the environment. – Neil Slater Aug 03 '20 at 14:25