2

The following quote (from AIMA) refers to the situation in which the minimax algorithm computes its values directly from the terminal states.

(The) definition of optimal play for MAX assumes that MIN also plays optimally—it maximizes the worst-case outcome for MAX. What if MIN does not play optimally? Then it is easy to show (...) that MAX will do even better. Other strategies against suboptimal opponents may do better than the minimax strategy, but these strategies necessarily do worse against optimal opponents.

But what about minimax using an evaluation function for nodes at a certain depth? Can we achieve optimality? If we do not have a perfect evaluation function, is the best we can do to achieve optimality with respect to the specific evaluation function?

Do considerations in the quote (do even better against a suboptimal opponent) still hold?

nbro
  • 39,006
  • 12
  • 98
  • 176
Kestina
  • 21
  • 1

1 Answers1

0

You can take any two player zero-sum game and change its rules, so that they become:

  • Start from the game state being evaluated.
  • Play for up to N turns.
  • If no winner is found after N turns, the winner is the one with the highest heuristic score (calculated in the same way as your minimax algorithm).

This is still a valid game, based on the original. A minimax using the same non-terminal heuristic will solve that derived/limited game optimally.

The solution may be unrelated to solutions to the original game. As a worst case, choose as your heuristic a random oracle (that assigns a random number to each state, in practice this could be done using a cryptographic hashing function). In the extreme case of N=1, then a random oracle heuristic will perform - on the original game - much the same as a random player, until either player is 1 turn away from winning. As N increases to include more possible game ends, the effect of the poor heuristic is reduced, as forced wins/losses are discovered (and usually those parts of the search tree removed because one of the players would reject them).

If you have a perfect heuristic, then the game is solved, and searching with N=1 against that heuristic will result in optimal play for the original game. Here's an example for tic-tac-toe where the choice for search depth 1 is encoded into the grid

After you take the move identified by the heuristic, then the horizon N moves forward. Using the "similar game" idea above, the derived game has changed. It is feasible that a different choice would have been better according to the heuristic evaluated on N+1 turns, or that the search will now find a forced win, tie or loss which was previously beyond its horizon.

Do considerations in the quote (do even better against a suboptimal opponent) still hold?

Yes, but only in terms of potentially being able to score more highly on the heuristic by turn N. The search will identify a next move where the worst case scenario (assuming the other player chooses a response which minimises your possible score by turn N) is a certain heuristic score on turn N. If you are not in that worst case due to the other player's choice, it may be possible to do better and reach a higher score by turn N.

However, because the horizon will move forward following a turn, you now have further information from searching deeper into the game, and that could mean that the other player's move was in fact worse for you. You can still guarantee a heuristic score by turn N at least as good as the one found before, but now you can see turn up to N+2, and a new search may find important differences.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60