Can we also estimate $V_{\pi}$ with SARSA?

Question

For SARSA, I know we can estimate the action value $Q(s,a)$, and the relationship between $V(s)$ and $Q(s,a)$ is $V_{\pi}(s) = \sum_{a \in \mathcal{A}} \pi(a|s)Q_{\pi} (s,a)$.

So my question is, can we simply estimate $V_{\pi}$ by applying the above equation to the $Q_{\pi}$ that we derived from SARSA? Will there be any restrictions to prevent estimating $V_{\pi}$ through SARSA?

Neil Slater · Answer 1 · 2022-05-20T14:13:20.887

What you suggest will work, the main restriction is needing to know $\pi$ fully in order to perform the conversion.

If you know that you are going to be estimating $V_{\pi}$ from the start, and have a fixed policy, then you could use basic TD learning instead of SARSA, where the update rule is:

$$V(s) \leftarrow V(s) + \alpha(r + \gamma V(s') - V(s))$$

Doing this would allow you to estimate $V_{\pi}$ from observations without knowing $\pi$.

Can we also estimate $V_{\pi}$ with SARSA?

1 Answers1