If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy?
For example, assume you have an RL problem with an action space of 1 (de/acceleration), state-space of 2 (distance from position and velocity), and the agent is tasked to accelerate in a straight line from position a to b.
Do you think the agent would behave majorly differently? I'm under the assumption that there would be minimal change aside from a longer training time assuming enough exploration, as the task is to still move in a straight line, but the agent would only have to account for steering action too now.