In the paper - "Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems", on page 1083, on the 6th line from the bottom, the authors define expectation of the empirical model as $$\hat{\mathbb{E}}_{s,s',a}[V(s')] = \sum_{s' \in S} \hat{P}^{a}_{s, s'}V(s').$$ I didn't understand the significance of this quantity since it puts $V(s')$ inside an expectation while assuming the knowledge of $V(s')$ in the definition on the right.
A clarification in this regard would be appreciated.
EDIT: The paper defines $\hat{P}^{a}_{s, s'}$ as, $$\hat{P}^{a}_{s, s'} = \frac{|(s, a, s', t)|}{|(s, a, t)|}.$$ Where $|(s, a, t)|$ is the number of times state $s$ was visited and action $a$ was taken and $|(s, a, s', t)|$ as the number of times among the $|(s, a, t)|$ times $(s, a)$ was visited when the next state landed in was $s'$ during model learning.
No explicit definition for $V$ is provided however, $V^{\pi}$ is defined as the usual expected discounted return, using the same definition as Sutton and Barto or other sources.