How does an episode end in OpenAI Gym's "MountainCar-v0" environment?

Question

I am working on OpenAI's "MountainCar-v0" environment. In this environment, each step that an agent takes returns (among other values) the variable named done of type boolean. The variable gets a True value when the episode ends. However, I am not sure how each episode ends. My initial understanding was that an episode should end when the car reaches the flagpost. However, that is not the case.

What are the states/actions under which the episode terminates in this environment?

score 3 · Accepted Answer · answered Jul 17 '20 at 15:08

3

The episode ends when either the car reaches the goal, or a maximum number of timesteps has passed. By default the episode will terminate after 200 steps. You can customize this with the _max_episode_steps attribute of the environment.

answered Jul 17 '20 at 15:08

harwiltz

1,091
1
6
6

score 3 · Answer 2 · answered Jul 17 '20 at 15:45

To answer your question, the specifics of some of the OpenAI Gym environments can be found on their wiki:

The episode ends when you reach 0.5 position, or if 200 iterations are reached.

There is a deeper question in what you asked, though:

My initial understanding was that an episode should end when the Car reaches the flagpost.

The environment certainly could be set up that way. Limiting the number of steps per episode has the immediate benefit of forcing the agent to reach the goal state in a fixed amount of time, which often results in a speedier trajectory by the agent (MountainCar-v0 further penalizes long trajectories through the reward signal). Also, the underlying learning algorithm may only perform policy updates after completion of the episode. If the agent will never reach the goal state under its current policy (i.e. the policy is very bad, lacks much randomness, etc.), then terminating the episode after a fixed amount of time will ensure that the agent is able to perform a policy update and try a new policy on the next episode (alternatively, the learning algorithm could perform a policy update during the episode). There is a branch of tasks called continuing tasks which never terminate (see Section 3.3 of Sutton and Barto), so the choice to limit the number of steps per episode is heavily dependent on the task at hand and the choice of learning algorithm.

How does an episode end in OpenAI Gym's "MountainCar-v0" environment?

2 Answers2