Do we assume the policy to be deterministic when proving the optimality?

Asked 5 years ago

Active 5 years ago

Viewed 27 times

In reinforcement learning, when we talk about the principle of optimality, do we assume the policy to be deterministic?

edited Aug 18 '20 at 10:18

nbro

asked Aug 18 '20 at 09:32

hakiki_makato

Hi. Can you clarify what your question is? Is it "when we talk about the principle of optimality, do we assume policy to be deterministic?" – nbro Aug 18 '20 at 10:14
Yes, that's what I meant – hakiki_makato Aug 18 '20 at 10:17
Which "principle of optimality" are you talking about exactly? Where did you read this term? Maybe provide a reference to the book? Are you talking about the optimality of the _value iteration_ and/or _policy iteration_? I heard the term "principle of optimality" only in the context of dynamic programming algorithms. – nbro Aug 18 '20 at 10:19
Yes, I'm reading Sutton-Barto, Introduction to Reinforcement learning. I'm asking in the context of DP only. – hakiki_makato Aug 18 '20 at 10:22

0 Answers0