I ask because PPO is apparently an on-policy algorithm & the HER paper says that it can be combine with any off-policy algorithm. Yet I see GitHub projects that have combined them somehow?
How is this done? And is it reasonable?
I ask because PPO is apparently an on-policy algorithm & the HER paper says that it can be combine with any off-policy algorithm. Yet I see GitHub projects that have combined them somehow?
How is this done? And is it reasonable?