I have a series of games with the following properties:
- 3 or more players, but purely non-cooperative (i.e., no coalition forming);
- sequential moves;
- perfect information;
- deterministic state transitions and rewards; and
- game size is large enough to make approximate methods required (e.g. $1000^{120}$ for certain problems).
For example, Chinese Checkers, or, for a more relevant example to my work, a multi-player knapsack problem (where each player, in round-robin fashion, can choose without replacement from a set of items with the goal of maximizing their own knapsack value).
Question: what policy improvement operators or algorithms a) converge to optimality or b) provide reasonably strong results on these games?
What have I researched
- In a small enough game (e.g., 3-person Nim with (3,4,5) starting board), full tree search is possible.
- In a one-person setting, certain exact Dynamic Programming formulations can reduce complexity. For example, in a one-person setting with a small enough knapsack, any standard array-based approach can solve the problem. I'm unsure if or how these "cost-to-achieve" shortest path formulations carry over to multi-player games.
- In a one-person setting, policy improvement approaches like rollout algorithms and fortified rollout algorithms have the cost improvement property. I'm unsure if this property carries over to multi-player versions.
- Work has been done (for example, this thesis) to demonstrate that Monte Carlo Tree search strategies can generate powerful policies on the types of games I'm interested in. I believe it's been proven that they converge to Nash Equilibrium in two-player perfect information games, but am not aware of any guarantees regarding multiplayer games.
- For imperfect information games, Monte Carlo Counterfactual Regret Minimization (e.g. here) is required for any convergence guarantees. Given I am working in a perfect information environment, these seem like overkill.