Proof that there always exists a dominating policy in an MDP

Asked Jul 16 '21 at 16:21

Active Jul 16 '21 at 17:52

Viewed 95 times

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad \text{for all } s\in S .$$

However, I could not find a proof of this result anywhere. Given that this statement is fundamental for dynamic programming (I think), I am interested in a rigorous proof. (I hope that I am not missing anything trivial here)

edited Jul 16 '21 at 17:52

nbro

39,006
12
98
176

asked Jul 16 '21 at 16:21

MMM

2

For what it's worth, a proof of this fact may be found in Chapter 6 of the book "Markov decision processes: Discrete stochastic dynamic programming" by Martin Puterman. To reproduce it here would take a quite a bit of effort due to the context required, I'm afraid. – mikkola Jul 20 '21 at 10:38
@mikkola Maybe you can just sketch the idea of the proof, if you are familiar with it. I guess that would be enough for a formal answer (in addition to pointing to that chapter). – nbro Dec 20 '21 at 14:52

Proof that there always exists a dominating policy in an MDP

0 Answers0