1

I'm dealing with a project that has episodes of variable length raging from just 3 steps to 20 steps. Now, I'm guessing that this may cause problems with GAE, as actions in large episodes will have much larger advantages than actions in smaller episodes simply because of the cascading addition of future rewards/costs. Is there some smart way of dealing with discounted future returns in such scenarios? Thank you.

1 Answers1

0

Generally you can make an agent care less about rewards in the future by decreasing the discount factor $\gamma$.

that this may cause problems with GAE, as actions in large episodes will have much larger advantage

Depending on the particular task you are solving this might not be a problem. If a longer episode gives an agent a higher expected return, then the agent should learn to prolong the episode. Or if the episode length is independent of the agent's actions, the agent should still learn to act optimally with respect to the random distribution of episode length.

twink_ml
  • 111
  • 2
  • The episode length is always predetermined (there are always certain options, but these options are taken changes the cost of the outcome). I wonder if there is a way to incorporate the length of the episode, which I already know, to the training... – Antonis Karvelas Nov 29 '22 at 14:13
  • You can always include this in your state space -- for example, you could add an integer feature for the known episode length, and an integer feature for the number of timesteps until episode completion. – twink_ml Nov 29 '22 at 16:42