I read about this paper Efficient Competitive Self-Play Policy Optimization which proposes an algorithm for training a population of agents with self-play using a perturbation based matchmaking approach.
I was wondering if this algorithm can also be used with regular MultiAgent Competition, e.g. an asymmetric game like robo-soccer with one goalie and one striker.
Are there specific properties that self-play fulfils, that doesn't hold up in regular MA Competition? Can approaches for self-play be used in scenarios where not the same policy is used for both agents? If not, what are some properties one has to keep in mind when trying to adapt those approaches?