Optimal episode length in reinforcement learning

Asked May 28 '21 at 15:48

Active May 28 '21 at 17:00

Viewed 2,119 times

I have a custom environment for stock trading where an episode can be as long as 2000-3000 steps. I've run several experiments with td3 and sac algorithms, average reward per episode flattens after few episodes. I believe average reward per episode should further improve, so I thought whether my training episode is too long. What is the recommended upper limit on the episode length?

edited May 28 '21 at 17:00

nbro

39,006
12
98
176

asked May 28 '21 at 15:48

Mika

The optimal length for an episode during training is a hyper-parameter (so it's probably _tuneable_). For example, in a maze environment, where the agent needs to get from one location to the other, let's say that the optimal number of steps to reach the goal is $N$ (i.e. the length of the shortest path). However, if you set the episode length to $N$ (during training), most episodes will be unsuccessful cuz of the stochastic nature of the training process. Off top of my head, I don't know if there are any recommended guidelines to set this value, so I can't provide a formal answer right now. – nbro May 28 '21 at 17:05
One trick for training RNN to predict long sequences is to start by using shorter sequences (e.g. 100 timesteps) then increase over time to the target, e.g. 1000 timesteps. I would imagine a similar strategy could work well for DRL. – Taw May 30 '21 at 00:23

Optimal episode length in reinforcement learning

0 Answers0