7

I have been using OpenAI Retro for awhile, and I wanted to experiment with two player games. By two player games, I mean co-op games like "Tennis-Atari2600" or even Pong, where 2 agents are present in one environment.

There is a parameter for players in the OpenAI documentation, but setting this variable to 2 does nothing in terms of the game.

How do you properly implement this? Can this even be done? The end goal is to have 2 separate networks per one environment.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
niallmandal
  • 211
  • 1
  • 6
  • 1
    The documentation at https://retro.readthedocs.io/en/latest/python.html implies `env = retro.make(game='Pong-Atari2600', players=2)` is valid - have you tried the demo code they give there? It appears to be an implementation of my second suggestion (splitting the reward signal and presenting both controllers in the action space). I had a quick look but cannot find the ROM so gave up for now – Neil Slater Mar 15 '19 at 16:25
  • OK, I found the ROM now, and the sample code on that page works fine for me, controlling both agents, and reversing the reward between P1 and P2 as expected. What goes wrong for you? – Neil Slater Mar 15 '19 at 16:42
  • I am not sure.. Maybe the environments that I used did not support 2 players. It would be helpful if there was documentation for this lol. – niallmandal Mar 15 '19 at 16:51
  • I have extended my answer based on what I found out about Pong – Neil Slater Mar 15 '19 at 17:08
  • I am currently in the same position as you, looking for two-player games (for openAI gym). Any update on your question? – YoYO Man Jun 28 '20 at 23:17
  • @YoYOMan See my answer (and comments below it). Open AI gym Pong can be set up for 2 agent-based players. – Neil Slater Jun 29 '20 at 08:08

1 Answers1

2

OpenAI Retro is an extension to OpenAI Gym.

As such, it does not support multiple agents in multi-player environments. Instead, the environment is always presented from the perspective of a single agent that needs to solve a control problem.

When there is more than one agent in a Gym problem, everything other than the first agent is considered part of the environment. Typically player 2 in 2-player problems is controlled by whatever game AI already exists.

You can work around this in a couple of ways, both involve you modifying existing environments, requiring a solid understanding of the integration and wrapper layers in Gym so that you can make correct edits. However you might find environments where either of these are done for you.

In fact, within the Retro environments, some multiplayer environments are supported using the second option below.

1. Supply additional player(s) as part of the environment

You could modify the environment code so that it accepts a parameter which is the agent used to control other players. Typically such an agent would not be set up to learn, it could just be some earlier version of the agent that is learning. You would need to code the interface between the supplied agent and the game engine - taking care to allow for its potentially different view of success.

If a Gym environment has been set up for you (or any human) to compete against your own agent using the keyboard, that is already halfway there to this solution. You could take a look at where the Gym environment has been set up for this, and modify the code to allow for automated input as opposed to action selection by keyboard.

This approach probably works best for competitive games where you can use a random selection of previous agents inserted into the environment to train a next "generation".

2. Present actions for multiple players as if there was a single player

Provided the game emulator allows you to do this, you can edit the code so that multiple players are controlled at once. The Gym environment does not specify how you construct your action choices, so you are then free to split the action vector up so that parts of it are chosen by one trained agent and parts of it by another.

You may need to work on different views of the environment and/or reward signal in this case. For instance in a competitive game (or a co-operative game with competitive elements such as personal scores) then you will need to add custom code to re-interpret reward values compared to single player game. For something simple like Pong, that could just be that Player 2 receives the negative of Player 1's reward signal.

This approach probably works best for cooperative games where success is combined. In that case, you have the choice of writing any number of separate agents that control different aspects of the action space - that would include having agents that separately controlled P1 and P2 game controller inputs.

Each agent would still effectively treat its partner as a part of the environment, so you may want to mix training scenarios so that agents for each controller did not get too co-dependent. For instance, if you hope to step in to control P1, you probably don't want the AI running P2 to have too specific expectations of P1's role and behaviour in the game.

This second option is what OpenAI Retro interface does. For instance Pong supports this:

env = retro.make(game='Pong-Atari2600', players=2)

I verified this using the sample code in the documentation, which controls both paddles, and receives the reward signal as a Python list mostly [0.0, 0.0] but whenever a player scores it becomes [-1.0, 1.0] or [1.0, -1.0] depending on which player got the point.

This will only be supported on retro games where someone has made the effort to make it work as above. If you have a game which does not do this, but in theory supports multiplayer on the ROM, you will need to look at and alter either the AI retro wrapper or the emulator or both.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • Do you have any idea how I could implement the second method? This is what I was planning to do, I am just not sure how to tackle it. Any ideas? – niallmandal Mar 15 '19 at 01:29
  • @niallmandal: Not in any more detail than I have presented. It will involve digging in to the emulator API to see how to work with a second controller. – Neil Slater Mar 15 '19 at 07:21
  • sorry kind of a basic question, but what exactly is the emulator API? – niallmandal Mar 15 '19 at 13:12
  • @niallmandal: I just mean the developer's interface that you are expected to use when writing code to interact with it. Hopefully, it is described in the documentation of the Python module or classes. It is unfortunately possible that you may need to look inside that code as well, if the P2 controller is not officially supported. – Neil Slater Mar 15 '19 at 13:31
  • Do you think this is possible with the retro integration tool? – niallmandal Mar 15 '19 at 16:07
  • @niallmandal I don't know what that is. It does seem like the sort of thing that would be involved, but I am not sure if it would be the right place to *start* looking – Neil Slater Mar 15 '19 at 16:10
  • 1
    sorry to keep bothering, but something else came up. When initiating the Pong game, with 2 players, there are 16 possible inputs (the controls). they are represented as a list that looks like this `[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]`. how do i separate out which inputs control P1 and which control p2? when testing each of the inputs, i could only move p1. am i doing something wrong? – niallmandal Mar 18 '19 at 02:54
  • @niallmandal: According to the docs, the first 8 values are for P1, the second 8 values are for P2. I only tried the demo so far which randomises the action vector and results in both paddles moving as you would expect. Just now I tried to only set P1 values or only set P2 values and got very weird results. I could control one paddle independently by setting `actions[0:7] = 0`, but could not control the other paddle. So I am sorry, but I don't know - there seems to be a mismatch between the documentation and behaviour, maybe a bug – Neil Slater Mar 18 '19 at 07:29
  • @niallmandal: OK, solved it, at least partially. Indices 0,1,2,3,4,5 are for P1. Indices 6,7,8,9,10,11 are for P2. No idea what indices 12,13,14,15 are for . . . but they don't make either paddle move – Neil Slater Mar 18 '19 at 10:46