I have a custom environment for stock trading where an episode can be as long as 2000-3000 steps. I've run several experiments with td3 and sac algorithms, average reward per episode flattens after few episodes. I believe average reward per episode should further improve, so I thought whether my training episode is too long. What is the recommended upper limit on the episode length?
Asked
Active
Viewed 2,119 times
3
-
The optimal length for an episode during training is a hyper-parameter (so it's probably _tuneable_). For example, in a maze environment, where the agent needs to get from one location to the other, let's say that the optimal number of steps to reach the goal is $N$ (i.e. the length of the shortest path). However, if you set the episode length to $N$ (during training), most episodes will be unsuccessful cuz of the stochastic nature of the training process. Off top of my head, I don't know if there are any recommended guidelines to set this value, so I can't provide a formal answer right now. – nbro May 28 '21 at 17:05
-
One trick for training RNN to predict long sequences is to start by using shorter sequences (e.g. 100 timesteps) then increase over time to the target, e.g. 1000 timesteps. I would imagine a similar strategy could work well for DRL. – Taw May 30 '21 at 00:23