What is the expectation of an empirical model in model based RL?

Question

In the paper - "Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems", on page 1083, on the 6th line from the bottom, the authors define expectation of the empirical model as $$\hat{\mathbb{E}}_{s,s',a}[V(s')] = \sum_{s' \in S} \hat{P}^{a}_{s, s'}V(s').$$ I didn't understand the significance of this quantity since it puts $V(s')$ inside an expectation while assuming the knowledge of $V(s')$ in the definition on the right.

A clarification in this regard would be appreciated.

EDIT: The paper defines $\hat{P}^{a}_{s, s'}$ as, $$\hat{P}^{a}_{s, s'} = \frac{|(s, a, s', t)|}{|(s, a, t)|}.$$ Where $|(s, a, t)|$ is the number of times state $s$ was visited and action $a$ was taken and $|(s, a, s', t)|$ as the number of times among the $|(s, a, t)|$ times $(s, a)$ was visited when the next state landed in was $s'$ during model learning.

No explicit definition for $V$ is provided however, $V^{\pi}$ is defined as the usual expected discounted return, using the same definition as Sutton and Barto or other sources.

I think I can interpret this if the LHS was $\mathbb{\hat{E}}_{s,a}[V(s')]$ . . . does the paper definitely show $\mathbb{\hat{E}}_{s,s', a}$? Does the paper define $V$ in this context? — Neil Slater, Jul 21 '20 at 13:20
@NeilSlater the paper does use $s, s', a$ in the notation. have edited to add details — ijuneja, Jul 21 '20 at 13:39

score 1 · Accepted Answer · answered Jul 21 '20 at 14:53

1

If I understand your question correctly, the significance of this is due to the fact that $s'$ is random. In the RHS of the equation it is assumed that $V(\cdot)$ is known for each state, but the quantity is measuring the expected value of the next state given the current state and action.

answered Jul 21 '20 at 14:53

harwiltz

1,091
1
6
6

I think i understand. This would mean that $s'$ means different things on the left and the right in a sense. – ijuneja Jul 21 '20 at 17:07
1

Yeah, exactly. Note that s' is the index in the sum – harwiltz Jul 21 '20 at 17:20

What is the expectation of an empirical model in model based RL?

1 Answers1