Why is it that the state visitation frequency equals the sum of state visitation frequency from initial time step to the horizon?

Question

In the maximum entropy inverse reinforcement learning paper, Ziebart et al. show that the state visitation frequency $\rho(s)$ of a state $s$ can be computed as $$ \rho_{\pi}(s) = \sum_{t}^{T} P(s_t=s|\pi), $$ which is the sum of the probability that the state being visited at each time step.

I just don't understand why is it the sum? From my perspective, a frequency should be the less than one, so that it should be the average value $$ \rho_{\pi}(s) = \frac{1}{T}\sum_{t}^{T} P(s_t=s|\pi). $$

My feeling is that they define the first equation and will then normalise it to make it a state distribution. — David, Apr 26 '21 at 08:34
It should be the average and this is rarely mentioned by people except for a IRL summer camp at UCB. You can check this [GithubIssue](https://github.com/yrlu/irl-imitation/issues/1#issuecomment-532552252) for details. — skypitcher, Apr 28 '21 at 08:08

score 1 · Answer 1 · answered May 30 '22 at 12:05

The equation you show does not appear in Ziebart et al (2008). They do provide a description of the computation in Algorithm 1.

It is the visitation frequency and it is not a probability distribution, so it does not need to be averaged.

If you look at Equation 2. in Arora & Doshi (2020), you a formulation that describes the Algorithm 1 quite well:

$\phi^\pi(s) = \phi^0(s) + \sum_{s'\in\mathcal{S}}P(s,\pi(s),s')\phi^\pi(s')$.

I am not very satisfied with this formulation, because, in my opinion, there should also be a summation over $a\in \mathcal{A}$, like $\eta(s)$ the expected number of visits, in Equation 9.2 in Sutton & Barto (2020).:

$\eta(s)=h(s)+\sum_{\bar{s}}\eta(\bar{s})\sum_a\pi(a|\bar{s})p(s|\bar{s},a)$.

To summarize, in all three descriptions, you just calculate how often a state is visited by policy $\pi$.

Why is it that the state visitation frequency equals the sum of state visitation frequency from initial time step to the horizon?

1 Answers1