What method is better to use for a two-player reinforcement learning environment?

Question

I want to create an RL agent for a mancala-type two-player game as my first actual project in the field. I've already completed the game itself and coded a minimax algorithm.

The question is: how should I proceed? Which is the better way: to create a custom OpenAI Gym environment and use stable baselines algorithms or create an AlphaZero-like Monte-Carlo Tree Search algorithm from scratch?

People here suggested that it is easier to create MCTS that use Gym, since the latter does not natively support multiplayer games. But I thought I could use my minimax algorithm and incorporate it into my custom environment, and since I have both the game and the minimax algorithm, it's easier to use Gym than MCTS.

Are there any pitfalls I should avoid?

The Method for solving a multiplayer game and the environment should be independent. In other words, if you design an environment (let's say an openai gym object) then it should be compatible with model-based and model-free methods. From your question it seems that the env is going to define the method which is incorrect. An env can be repreresented as an MDP (or a multiagent MDP (MMDP)) or other decision process. Then the algorithm (method) will solve the specific MDP by interacting with the env (2 separate objects interacting). — Constantinos, Jan 03 '22 at 22:58
If you're asking "which libraries should I use for my project?", this seems to be off-topic here. Programming questions as well as questions asking for tools are off-topic. However, if you're looking for an "approach" or "algorithm", that may be on-topic. Please, take the time to read [our on-topic page](https://ai.stackexchange.com/help/on-topic). — nbro, Jan 05 '22 at 10:30

What method is better to use for a two-player reinforcement learning environment?

0 Answers0