Printing action_space
for Pong-v0 gives Discrete(6)
as output, i.e. $0, 1, 2, 3, 4, 5$ are actions defined in the environment as per the documentation. However, the game needs only 2 controls. Why do we have this discrepancy? Further, is that necessary to identify which number from 0 to 5 corresponds to which action in a gym environment?
-
Programming questions (including how to use a certain library or API) are off-topic here. This type of question is more appropriate for Stack Overflow, even though the question is still related to Artificial Intelligence/Reinforcement Learning. Here, we focus on the theoretical, social, and philosophical aspects of AI. Please, see https://ai.stackexchange.com/help/on-topic for more details. – nbro Nov 25 '20 at 11:05
3 Answers
You can try the actions yourselves, but if you want another reference, check out the documentation for ALE at GitHub.
In particular, 0 means no action, 1 means fire, which is why they don't have an effect on the racket.
Here's a better way:
env.unwrapped.get_action_meanings()

- 10,423
- 3
- 43
- 91

- 181
- 1
- 5
You can try to figure out what exactly does an action do using such script:
action = 0 # modify this!
o = env.reset()
for i in xrange(5): # repeat one action for five times
o = env.step(action)[0]
IPython.display.display(
Image.fromarray(
o[:,140:142] # extract your bat
).resize((300, 300)) # bigger image, easy for visualization
)
action
0 and 1 seems useless, as nothing happens to the racket.
action
2 & 4 makes the racket go up, and action
3 & 5 makes the racket go down.
The interesting part is, when I run the script above for the same action
(from 2 to 5) two times, I have different results. Sometimes the racket reaches the top(bottom) border, and sometimes it doesn't. I think there might be some randomness on the speed of the racket, so it might be hard to measure which type of UP(2 or 4) is faster.

- 141
- 3
There seems to be no difference between 2 & 4 and 3 & 5. The inconsistency mentioned by Icyblade is due to the mechanics of the Pong environment.
"Each action is repeatedly performed for a duration of k frames, where k is uniformly sampled from {2,3,4}"
So the action is just repeated a different number of times due to randomness

- 31
- 1