Are there assumptions made about Self-Play that don't hold up in regular MA competition?

Question

I read about this paper Efficient Competitive Self-Play Policy Optimization which proposes an algorithm for training a population of agents with self-play using a perturbation based matchmaking approach.

I was wondering if this algorithm can also be used with regular MultiAgent Competition, e.g. an asymmetric game like robo-soccer with one goalie and one striker.

Are there specific properties that self-play fulfils, that doesn't hold up in regular MA Competition? Can approaches for self-play be used in scenarios where not the same policy is used for both agents? If not, what are some properties one has to keep in mind when trying to adapt those approaches?

Can you clarify how the "regular multi-agent competition" (like robo-soccer) is different from the task that the authors of the paper addressed in their paper? What did they use the population of agents for? — nbro, May 24 '22 at 08:58
@nbro As the authors specifically use their algorithm for self-play, I was wondering what this might imply for its usage. With ''regular MA-competition'' I refer to the circumstance where we use different kind of policies for the agents. Considering the robo-soccer example, I meant that e.g. one might use a different policy for goalie and striker. I see that the authors use an agent population, which in the end uses multiple policies too, but I was wondering why then being overly specific and coin it only to self-play, rather than more general Multi-Agent Competition. — kitaird, May 24 '22 at 19:37

Are there assumptions made about Self-Play that don't hold up in regular MA competition?

0 Answers0