2

I have been seeing notations on Expectations with their respective subscripts such as $E_{s_0 \sim D}[V^\pi (s_0)] = \Sigma_{t=0}^\infty[\gamma^t\phi(s_t)]$. This equation is taken from https://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf and $Q^\pi(s,a,R) = R(s) + \gamma E_{s'\sim T(s,a,\cdot)}[V^\pi(s',R)]$ ,in the case of the Bayesian IRL paper.(https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-416.pdf)

I understand that $s_0 \sim D$ means that the starting state $s_0$ is drawn from a distribution of starting states $D$. But how do we understand the latter with subscript ${s'\sim T(s,a,\cdot)}$ ? How is $s'$ drawn from a distribution of transition probabilities?

nbro
  • 39,006
  • 12
  • 98
  • 176
calveeen
  • 1,251
  • 7
  • 17

1 Answers1

2

The dot ($.$) at the end of $T(s,a,.)$ shows all possible states that we can go from state $S$ by doing action $a$. As you know there are some probabilities here for choosing those states, that the sum of these probabilities is equal to 1. Hence, $T(s,a,.)$ is a probability distribution.

OmG
  • 1,731
  • 10
  • 19