In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9 possible actions), but each action costs a huge amount of resources and there is no model for the MDP, would this also be a good application of policy-based methods?
Asked
Active
Viewed 50 times
1