1

We can use a model-free Monte Carlo approach to solving an MDP $(S,A,R,P,\gamma)$ with transition dynamics $P$ unknown by estimating Q-values by rolling out trajectories starting from random states $s_0 \in S$ and improving the policy $\pi$ greedily. This is the Monte Carlo Exploring Starts algorithm in Sutton and Barto page 99 2nd edition.

Does anyone know if there is a sample complexity result for this algorithm?

nbro
  • 39,006
  • 12
  • 98
  • 176
Snowball
  • 213
  • 1
  • 6

0 Answers0