In reinforcement learning, the state-action value function seems to be used more than the state value function. Why is it so?
-
2What do you mean by "more"? Btw, if I understand your question correctly, it is because in a non-deterministic setting, you won't know which action to select simply from the state values. – Hai Nguyen Feb 14 '20 at 07:31
-
@HaiNguyen thanks. What happens in the case of deterministic setting – Bhuwan Bhatt Feb 19 '20 at 16:37
-
1If you know what state you will end up after taking an action (for example if you walk in a maze you always know your next position after you take a step), you can use the value function of possible next states and pick the action that leads to the highest value. – Hai Nguyen Feb 20 '20 at 11:51
1 Answers
We are ultimately interested in getting an optimal policy, that is the optimal sequence of actions to reach the final goal. State values on its own don't provide that, they tell you expected return from specific state onward but they don't tell you which action to take. In order to derive an optimal action in a specific state you would have to simulate all possible actions one step ahead and then pick the action that leads you to the state with highest state value. That is often inconvenient or impossible. State action values connect the expected return with actions, not states, so you don't need to simulate all actions one step ahead and see where you end up, you only need to pick an action that has the highest value because you know that is the best action to take in that state.

- 2,306
- 1
- 5
- 14
-
2It may be also worth emphasizing that, if you're using a table, the state-action value function will be larger than the state value function, so it will require more memory to be stored. – nbro Feb 14 '20 at 14:20
-