2

Is there a multi-agent deep reinforcement learning algorithm which is for environments with only discrete action spaces (Not hybrid) and have centralized training?

I have been looking for algorithms, (A2C, MADDPG etc.) but still havent find any algorithm that provides all of properties i mentioned (Multi agent + discrete action space + deep learning + centralized training).

I am wondering if we use an actor network that gets state as input and concatenated discrete actions of agents as output (For example if agent has 3 actions and we have 4 agents output can be [0,0,1, 0,1,0, 0,0,1, 1,0,0]) is that would be bad idea ?

Uur Kn
  • 21
  • 1
  • I havn't done RL in a while, but isn't the point of multi agent the fact that they act on their own and thus are disctinct networks ? Why would you use multi agent if you merge them into one network agent ? I remember [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/ML-Agents-Overview.md) plugin providing this kind of multi agent learning, maybe you could have a look at how they do it. – Ubikuity May 17 '21 at 13:01
  • 1
    @Ubikuity if the agents are homogeneous, then you can have a common NN policy that is executed decentralized but have centralized training. Also, the last time I worked with Unity ML-Agents they didn't have a direct implementation of MARL algorithms. As I remember they have the usual RL PPO and SAC algorithms, although they can work in simple MARL implementations. – Felipe Costa May 18 '21 at 16:45

1 Answers1

1

A natural policy to act in an environment with discrete action space would be a softmax.

This paper describes a method that uses the idea of centralized training, and I believe could be used in your implementation.

With regard to your last question, I don't know if i understood, but if you have a system that must perform 3 actions, you could assign each action to a specific agent (assuming we have three different action spaces). Then you would have a cooperation game with 3 agents, where all of them have a common reward function. In theory, this 3 agents represents an individual agent that interacts with the environment.

Felipe Costa
  • 103
  • 5