2

I was reading this article about the question "Why do we dream?" in which the author discusses dreams as a form of rehearsal for future threats, and presents it as an evolutive advantage. My question is whether this idea has been explored in the context of RL.

For example, in a competition between AIs on a shooter game, one could design an agent that, besides the behavior it has learned in a "normal" training, seeks for time in which is out of danger, to then use its computation time in the game to produce simulations that would further optimize its behavior. As the agent still needs to be somewhat aware of its environment, it could alternate between processing the environment and this kind of simulation. Note that this "in-game" simulation has an advantage with respect to the "pre-game" simulations used for training; the agent in the game experiences the behavior of the other agents, which could not have been predicted beforehand, and then simulates on top of these experiences, e.g. by slightly modifying them.

For more experienced folks, does this idea make sense? has something similar been explored?

I have absolutely no experience in the field, so I apologize if this question is poorly worded, dumb or obvious. I would appreciate suggestions on how to improve it if this is the case.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • It's not clear to me what the restriction really is that you're putting on the "imagination" strategy. Is the restriction that it needs to be performed online while the agent learns and interacts with the environment? – nbro Aug 26 '20 at 22:13
  • It seems to me dreaming, and sleep, wouldn't necessary for an algorithm, as the function we humans do while sleeping could likely be accomplished in parallel. The closest thing I can think of, offhand, is "[goalless search](https://www.quantamagazine.org/computers-evolve-a-new-path-toward-human-intelligence-20191106/)", which is a form of algorithmic creativity. – DukeZhou Aug 26 '20 at 23:55

2 Answers2

2

Yes, the concept of dreaming or imagining has already been explored in reinforcement learning.

For example, have a look at Metacontrol for Adaptive Imagination-Based Optimization (2017) by Jessica B. Hamrick et al., which is a paper that I gave a talk/presentation on 1-2 years ago (though I don't remember well the details anymore).

There is also a blog post about the topic Agents that imagine and plan (2017) by DeepMind, which discusses two more recent papers and also mentions Hamrick's paper.

In 2018, another related and interesting paper was also presented at NIPS, i.e. World Models, by Ha and Schmidhuber.

If you search for "imagination/dreaming in reinforcement learning" on the web, you will find more papers and articles about this interesting topic.

nbro
  • 39,006
  • 12
  • 98
  • 176
0

Model-based RL is obviously the correct approach. Mainly because it lets you simulate the environment internally without having direct interaction.

And all successful RL algorithms essentially are model-based because nobody has done real-time RL and been successful.

nbro
  • 39,006
  • 12
  • 98
  • 176
FourierFlux
  • 783
  • 1
  • 4
  • 14
  • Right, but I guess my question is about the idea of a model based RL in which agents keep information they see in "busy times" (a fight in the case of a shooter game) and then seek for idle times where they can better process that information, by adapting to it or even doing simulations over variations of it to improve their behavior. – Bernardo Subercaseaux Aug 26 '20 at 19:39
  • There isn't a concept of "idle time" for RL agents that I know of. Deep RL use the same power regardless of the current world. – FourierFlux Aug 26 '20 at 21:47
  • Is Q-learning model-based? What was used to achieve super-human performance on the Atari games? – nbro Aug 26 '20 at 22:08
  • If the RL agent is running in a simulator to learn I would call the global approach to the problem effectively model based even if not explicit since the model has already been constructed. – FourierFlux Aug 26 '20 at 22:26
  • @FourierFlux Well, the simulator is the game itself in the case of Atari and the DQN case. RL algorithms need an environment. So, according to your definition, every RL algorithm is model-based only because there's an environment and the distinction between model-based and model-free is meaningless. Please, have a look at [Neil's answer](https://ai.stackexchange.com/a/6733/2444) and [my answer](https://ai.stackexchange.com/a/8820/2444) to know the difference between model-based and model-free. – nbro Aug 26 '20 at 22:31
  • The Atari games tick speed has been dramatically increased to allow many games to be played effectively meaning it is running in a simulator. I have yet to see any "model free" agent according to your definition learn anything in real time. – FourierFlux Aug 26 '20 at 22:44
  • 1
    Model based RL is defined as an algorithm where an agent uses learned dynamics of the system to help with learning. Q-learning does not do this (not does DQN) and is strictly _model free_. – David Aug 26 '20 at 23:49