Questions tagged [on-policy-distribution]

2 questions

votes

1 answer

What is the difference between an on-policy distribution and state visitation frequency?

On-policy distribution is defined as follows in Sutton and Barto: On the other hand, state visitation frequency is defined as follows in Trust Region Policy Optimization: $$\rho_{\pi}(s) = \sum_{t=0}^{T} \gamma^t P(s_t=s|\pi)$$ Question: What is…

asked Dec 08 '21 at 10:36

user529295

votes

2 answers

In the on-policy state distribution for episodic tasks, why don't we take into account the length of the episode?

In Sutton & Barto's "Reinforcement Learning: An Introduction", 2nd edition, page 199, they describe the on-policy distribution for episodic tasks in the following box: I don't understand how this can be done without taking the length of the episode…

reinforcement-learning sutton-barto episodic-tasks on-policy-distribution

asked Nov 19 '19 at 03:32

user118967