1

If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy?

For example, assume you have an RL problem with an action space of 1 (de/acceleration), state-space of 2 (distance from position and velocity), and the agent is tasked to accelerate in a straight line from position a to b.

Do you think the agent would behave majorly differently? I'm under the assumption that there would be minimal change aside from a longer training time assuming enough exploration, as the task is to still move in a straight line, but the agent would only have to account for steering action too now.

nbro
  • 39,006
  • 12
  • 98
  • 176
Philori
  • 11
  • 1
  • Hi and welcome to AI Stack Exchange. Could you clarify the sequence of events - when you will train the agent, when you make the change to environment? Also what you mean by "effect" - do you expect the trained agent to still work, or that re-training should be guaranteed to work after the change, or something else? – Neil Slater May 15 '22 at 13:12
  • Re-reading your question, it is also not clear what the new action is that you propose to add, and how you would propose to add an action in a new dimension (e.g. steering) without tracking that dimension in the state. – Neil Slater May 15 '22 at 13:16
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community May 16 '22 at 00:06

0 Answers0