1

In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9 possible actions), but each action costs a huge amount of resources and there is no model for the MDP, would this also be a good application of policy-based methods?

nbro
  • 39,006
  • 12
  • 98
  • 176
tmaric
  • 382
  • 2
  • 8

0 Answers0