Why is the actor-critic algorithm limited to using on-policy data?

Question

Why is the actor-critic algorithm limited to using on-policy data? Or can we use the actor-critic algorithm with off-policy data?

score 1 · Answer 1 · edited Feb 15 '19 at 19:43

1

It's because, in the actor-critic algorithm, the objective function is an expectation under the $\tau$ of the policy. If we want to use off-policy data, we have to resort to importance sampling relative to the other policy.

edited Feb 15 '19 at 19:43

nbro

39,006
12
98
176

answered Jan 08 '19 at 02:33

apuffin

31
2

What does the tau value mean? – asaf92 Mar 22 '20 at 10:38

Why is the actor-critic algorithm limited to using on-policy data?

1 Answers1