Some RL algorithms can only be used for environments with continuous action spaces (e.g TD3, SAC), while others only for discrete action spaces (DQN), and some for both
REINFORCE and other policy gradient variants have the choice of using a categorical policy for discrete actions and a Gaussian policy for continuous action spaces, which explains how they can support both. Is that interpretation completely correct?
For algorithms that learn a Q function or a Q function and a policy, what places this restriction for their use on either discrete or continuous action spaces environments?
In the same regard, if an algorithm suited for discrete spaces is to be tuned to handle continuous action spaces, or vice versa. What does such a configuration involve?