Is it possible to combine two policy-based RL agents?

Asked Jun 02 '22 at 07:35

Active Jun 02 '22 at 21:54

Viewed 176 times

I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one of the strategies.

If I use Q-Learning or in general value-based methods, it is easy to combine the results of the two agents to select the action that maximizes the overall value. One could just add the values of each action from two agents.

My question is, is it possible to combine the result for a policy based method, e.g. PPO? The output of policy based methods is a probability distribution over actions, and I am not sure how to combine them.

edited Jun 02 '22 at 21:54

nbro

39,006
12
98
176

asked Jun 02 '22 at 07:35

BlackBrain

Is it possible to combine two policy-based RL agents?

0 Answers0