I want to create an RL agent for a mancala-type two-player game as my first actual project in the field. I've already completed the game itself and coded a minimax algorithm.
The question is: how should I proceed? Which is the better way: to create a custom OpenAI Gym environment and use stable baselines algorithms or create an AlphaZero-like Monte-Carlo Tree Search algorithm from scratch?
People here suggested that it is easier to create MCTS that use Gym, since the latter does not natively support multiplayer games. But I thought I could use my minimax algorithm and incorporate it into my custom environment, and since I have both the game and the minimax algorithm, it's easier to use Gym than MCTS.
Are there any pitfalls I should avoid?