6

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad \text{for all } s\in S .$$

However, I could not find a proof of this result anywhere. Given that this statement is fundamental for dynamic programming (I think), I am interested in a rigorous proof. (I hope that I am not missing anything trivial here)

nbro
  • 39,006
  • 12
  • 98
  • 176
MMM
  • 185
  • 3
  • 2
    For what it's worth, a proof of this fact may be found in Chapter 6 of the book "Markov decision processes: Discrete stochastic dynamic programming" by Martin Puterman. To reproduce it here would take a quite a bit of effort due to the context required, I'm afraid. – mikkola Jul 20 '21 at 10:38
  • @mikkola Maybe you can just sketch the idea of the proof, if you are familiar with it. I guess that would be enough for a formal answer (in addition to pointing to that chapter). – nbro Dec 20 '21 at 14:52

0 Answers0