3

Agent can have reasoning skills (prediction, taking calculated guesses, etc.) and those skills can help reinforcement learning of this agent. Of course, reinforcement learning itself can help to develop reasoning skills. Are there research that explores this impact of reasoning and consciousness on the effectivenes of reinforcement learning. Or maybe people just sit and wait such skills to emerge during reinforcement learning?

TomR
  • 823
  • 5
  • 15

1 Answers1

1

It sounds like you are describing a synthesis of two competing ways to solve the MDP problem.

In reinforcement learning, we solve the MDP problem by having the agent move around its environment, observe rewards and transitions in response to the actions it takes, and build a model of the relationship between actions and rewards that allows it to maximize rewards.

An older approach is to give the agent facts about the world that can be encoded as logical rules. The agent then uses unification to reason about the consequences of actions within that framework of rules. The agent then takes actions that maximize the rewards it can expect, given the rules and the information at hand. A problem with this approach was that it did not work well in problem domains with probabilistic rules (i.e. X usually happens when action Y is taken).

A hybrid approach, somewhere between these two is the use of Value Iteration or Policy Iteration methods. These are so-called "model-based" reinforcement learning algorithms (although, I would tend to say that makes them something other than true reinforcement learning...). Like the older logic-based approaches, they start by writing down a series of rules that fully describe how things happen in the world, and then derive the logical consequences of those rules to compute the best actions the agent could take. Like reinforcement learning however, they are able to account for probabilistic rules and probabilistic rewards. If you had an exact description of a game of chance, you could write it down as an MDP, and then solve it exactly using these techniques.

Importantly, value & policy iteration methods are not feasible if the state and action spaces are very large (as they often are), and are really not feasible if the MDP is not known exactly (i.e. if you don't know all the rules of the game in advance). That's where reinforcement learning shines.

John Doucette
  • 9,147
  • 1
  • 17
  • 52