2

I ask because PPO is apparently an on-policy algorithm & the HER paper says that it can be combine with any off-policy algorithm. Yet I see GitHub projects that have combined them somehow?

How is this done? And is it reasonable?

profPlum
  • 360
  • 1
  • 9

0 Answers0