1

Some RL algorithms can only be used for environments with continuous action spaces (e.g TD3, SAC), while others only for discrete action spaces (DQN), and some for both

REINFORCE and other policy gradient variants have the choice of using a categorical policy for discrete actions and a Gaussian policy for continuous action spaces, which explains how they can support both. Is that interpretation completely correct?

For algorithms that learn a Q function or a Q function and a policy, what places this restriction for their use on either discrete or continuous action spaces environments?

In the same regard, if an algorithm suited for discrete spaces is to be tuned to handle continuous action spaces, or vice versa. What does such a configuration involve?

nbro
  • 39,006
  • 12
  • 98
  • 176
mugoh
  • 531
  • 4
  • 20
  • So, essentially, you're asking two questions: 1. in general, what makes an RL algorithm applicable to continuous and/or discrete action spaces? 2. how can we adapt an RL algorithm that is applicable to continuous action spaces to discrete ones (and vice-versa)? – nbro Nov 15 '20 at 13:47
  • Yes @nrbo. Would it be more recommended to ask them separately in this case? – mugoh Nov 16 '20 at 17:10
  • These are really very related questions, so I am not so sure. Maybe, in this case, I would leave it like this, because these 2 questions can be merged into 1 where you're asking which RL algorithms can deal with continuous (or discrete) action spaces, either if they support it out of the box or not. – nbro Nov 16 '20 at 17:13
  • Sure, they can be merged into one but I also feel asking for which RL algorithms deal with continuous or discrete spaces will leave out the question about what "features" make an algorithm application to a certain action spaces, and how they would be transferrable to an algorithm that's applicable to a different action space. – mugoh Nov 17 '20 at 06:59

0 Answers0