3

Model-based RL creates a model of the transition function.

Tabular Q-Learning does this iteratively (without directly optimizing for the transition function). So, does this make tabular Q-learning a type of model-based RL?

nbro
  • 39,006
  • 12
  • 98
  • 176
echo
  • 673
  • 1
  • 5
  • 12
  • 1
    I guess, you are talking about the Q table? That is not the same as the transition model. – okkhoy Dec 28 '17 at 02:27

2 Answers2

8

Tabular Q-Learning does not explicitly create a model of the transition function. It does not generate any output that you can afterwards use as a function to predict what the next state s' will be given a current state s and an action a (that's what a transition function would allow you to do). So no, Q-learning is still model-free.

By the way, model-based RL does not necessarily have to involve creating a model of the transition function. Model-based RL can also mean that you assume that such a function is already given. It just means that you use such a function in some way.

nbro
  • 39,006
  • 12
  • 98
  • 176
Dennis Soemers
  • 9,894
  • 2
  • 25
  • 66
2

In model-based learning, the learning agent utilizes a model that was previously learned to accomplish a task, whereas model-free RL doesn't use the environment to learn but simply relies on trial and error experience for action selection. The similarity is that in both methods the learning agent is trying to maximize reward from its actions.

Q-Learning is a model-free RL method. It can be used to identify an optimal action-selection policy for any given finite Markov Decision Process. How it works is that it learns an action value function, which essentially gives the expected utility of an action in a given state, then follows an optimal policy afterwards.

nbro
  • 39,006
  • 12
  • 98
  • 176
Seth Simba
  • 1,176
  • 1
  • 11
  • 29