We can use a model-free Monte Carlo approach to solving an MDP $(S,A,R,P,\gamma)$ with transition dynamics $P$ unknown by estimating Q-values by rolling out trajectories starting from random states $s_0 \in S$ and improving the policy $\pi$ greedily. This is the Monte Carlo Exploring Starts algorithm in Sutton and Barto page 99 2nd edition.
Does anyone know if there is a sample complexity result for this algorithm?