2

Let's consider some non-episodic problem. Maybe a game which can go on forever.

My question is: Why are agents still trained in episodes?

My understanding is that the agent's neural network is updated in batches depending on the batch size (so every x timesteps, the neural network will be updated). Therefore.. nothing special happens at the end of an episode, right? The agent does not "review" its performance and update itself again or anything like that.

Nevertheless, I find that shorter episode lengths can be helpful in my problem, but for the above reasons, I don't understand why 100 short-length episodes are better than 1 huge long episode, or why that would make any difference at all in a non-episodic task.

Any insight would be greatly appreciated!

EDIT: I suppose I'm trying to understand whether anything "special" happens in the network at the end of an episode. If not, let's imagine a situation where I have a time series environment, and I train the agent in one single episode from t=0 to t=100 . Now, let's say I train 10 agents, each starting a different multiple of 10 (t=0, t=10, t=20 etc.) and each new agent's start environment is exactly the same as the end environment of the previous agent (startenv for agent at t=20 is equal to endenv for agent that trained t=10 to t=19, preserving continuity). Would these two situations result in the exact same final model/agent? That is, would the 10th agent of the divided data group be the exact same as the agent at the end of the t=0 to t=100 run? Maybe this is impossible due to the way the gamma discount parameter will interact with the episode length. What if we set an extreme gamma parameter, so that the differences are negligible? I just want to understand if the agent in any real way considers "episodes" in its learning/training.

Vladimir Belik
  • 342
  • 2
  • 12
  • I tried to give some of that information here: https://ai.stackexchange.com/questions/35793/for-continuing-tasks-is-the-choice-of-episode-length-completely-arbitrary/35796#35796 - it suggests there is a balance to be struck due to possibly special start conditions or isolated loops. Could you suggest what additional information might help you. – Neil Slater Jun 08 '22 at 19:17
  • @NeilSlater I edited the question. Does that make it more distinct and clear? I agree that the answer you gave definitely touches on this, but I think I'm trying to really understand if there's something special about an episode ending, or if it's purely a hyperparam (as you mentioned) which determines the states an agent can reach. – Vladimir Belik Jun 08 '22 at 19:28

0 Answers0