Questions tagged [on-policy-distribution]
2 questions
4
votes
1 answer
What is the difference between an on-policy distribution and state visitation frequency?
On-policy distribution is defined as follows in Sutton and Barto:
On the other hand, state visitation frequency is defined as follows in Trust Region Policy Optimization:
$$\rho_{\pi}(s) = \sum_{t=0}^{T} \gamma^t P(s_t=s|\pi)$$
Question: What is…

user529295
- 359
- 1
- 10
3
votes
2 answers
In the on-policy state distribution for episodic tasks, why don't we take into account the length of the episode?
In Sutton & Barto's "Reinforcement Learning: An Introduction", 2nd edition, page 199, they describe the on-policy distribution for episodic tasks in the following box:
I don't understand how this can be done without taking the length of the episode…

user118967
- 208
- 1
- 8