In reinforcement learning, when we talk about the principle of optimality, do we assume the policy to be deterministic?
Asked
Active
Viewed 27 times
2

nbro
- 39,006
- 12
- 98
- 176

hakiki_makato
- 153
- 4
-
Hi. Can you clarify what your question is? Is it "when we talk about the principle of optimality, do we assume policy to be deterministic?" – nbro Aug 18 '20 at 10:14
-
Yes, that's what I meant – hakiki_makato Aug 18 '20 at 10:17
-
Which "principle of optimality" are you talking about exactly? Where did you read this term? Maybe provide a reference to the book? Are you talking about the optimality of the _value iteration_ and/or _policy iteration_? I heard the term "principle of optimality" only in the context of dynamic programming algorithms. – nbro Aug 18 '20 at 10:19
-
Yes, I'm reading Sutton-Barto, Introduction to Reinforcement learning. I'm asking in the context of DP only. – hakiki_makato Aug 18 '20 at 10:22