1

I have built a custom multi-agent environment with PettingZoo, where a turn-based game with two agents, A and B, is setup.

I want to examine situations where malicious behavior may arise, given the game rules, and I am looking into training approaches.

To do that, I have implemented a deterministic policy as a baseline / control.

Fixing agent A to that baseline policy, I want to subsequently train agent B and observe the resulting behaviors.

After B arrives at a desirable behavioral pattern, I want to train agent A to see how it responds to B's actions.

Having the above setting in mind:

  • Is the above training approach, which keeps one agent fixed and trains the other, correct?

    Should I follow a MARL approach for training instead, or is the above approach that encapsulates one agent as part of the environment sound?

In general, what are requirements / desiderata to look for that hint that a MARL approach is the correct way and/or a separate training scheme is erroneous?

nbro
  • 39,006
  • 12
  • 98
  • 176
npit
  • 111
  • 1
  • I've tried to rewrite the title to summarize your question/problem. Make sure that's correct!! – nbro Jun 04 '22 at 18:40

0 Answers0