For questions about model-based reinforcement learning methods (or algorithms). An example of a model-based algorithm is Dyna-Q, which estimates a model of the environment (i.e. the transition function of the associated Markov decision process).
Questions tagged [model-based-methods]
40 questions
86
votes
6 answers
What's the difference between model-free and model-based reinforcement learning?
What's the difference between model-free and model-based reinforcement learning?
It seems to me that any model-free learner, learning through trial and error, could be reframed as model-based. In that case, when would model-free learners be…

mynameisvinn
- 961
- 1
- 7
- 6
6
votes
2 answers
Are there RL algorithms that also try to predict the next state?
So far I've developed simple RL algorithms, like Deep Q-Learning and Double Deep Q-Learning. Also, I read a bit about A3C and policy gradient but superficially.
If I remember correctly, all these algorithms focus on the value of the action and try…

Ram Rachum
- 261
- 1
- 9
5
votes
3 answers
Isn't a simulation a great model for model-based reinforcement learning?
Most reinforcement learning agents are trained in simulated environments. The goal is to maximize performance in (often) the same environment, preferably with a minimum amount of interactions. Having a good model of the environment allows to use…

Ray Walker
- 451
- 3
- 8
5
votes
2 answers
How can the policy iteration algorithm be model-free if it uses the transition probabilities?
I'm actually trying to understand the policy iteration in the context of RL. I read an article presenting it and, at some point, a pseudo-code of the algorithm is given :
What I can't understand is this line :
From what I understand, policy…

Samuel Beaussant
- 183
- 3
4
votes
1 answer
How does a model based agent learn the model?
I want to build model-based RL. I am wondering about the process of building the model.
If I already have data, from real experience:
$S_1, a \rightarrow R,S_2$
$S_2, a \rightarrow R,S_3$
Can I use this information, to build model-based RL? Or it…

user46045
- 43
- 2
4
votes
1 answer
What is the difference between a distribution model and a sampling model in Reinforcement Learning?
The book from Sutton and Barto, Reinforcement Learning: An Introduction, define a model in Reinforcement Learning as
something that mimics the behavior of the environment, or more generally, that allows inferences to be made about how the…

A. Pesare
- 141
- 4
4
votes
1 answer
Is the state transition matrix known to the agents in a Markov decision processes?
The question is more or less in the title.
A Markov decision process consists of a state space, a set of actions, the transition probabilities and the reward function. If I now take an agent's point of view, does this agent "know" the transition…

Felix P.
- 287
- 1
- 6
4
votes
1 answer
Is the minimax algorithm model-based?
Trying to get my head around model-free and model-based algorithms in RL. In my research, I've seen the search trees created via the minimax algorithm. I presume these trees can only be created with a model-based agent that knows the full…

mason7663
- 603
- 3
- 10
4
votes
1 answer
Why are model-based methods more sample efficient than model-free methods?
Why do model-based methods use fewer samples than model-free methods? Here, I'm specifically referring to model-based methods in which we have to learn a policy and model. I can only think of two reasons for this question:
We can potentially obtain…

Maybe
- 441
- 2
- 11
4
votes
1 answer
How do temporal-difference and Monte Carlo methods work, if they do not have access to model?
In value iteration, we have a model of the environment's dynamics, i.e $p(s', r \mid s, a)$, which we use to update an estimate of the value function.
In the case of temporal-difference and Monte Carlo methods, we do not use $p(s', r \mid s, a)$,…

strongguy122
- 41
- 1
3
votes
2 answers
Is Q-learning a type of model-based RL?
Model-based RL creates a model of the transition function.
Tabular Q-Learning does this iteratively (without directly optimizing for the transition function). So, does this make tabular Q-learning a type of model-based RL?

echo
- 673
- 1
- 5
- 12
3
votes
1 answer
Why is learning $s'$ from $s,a$ a kernel density estimation problem but learning $r$ from $s,a$ is just regression?
In David Silver's 8th lecture he talks about model learning and says that learning $r$ from $s,a$ is a regression problem whereas learning $s'$ from $s,a$ is a kernel density estimation. His explanation for the difference is that if we are in a…

David
- 4,591
- 1
- 6
- 25
2
votes
1 answer
Model-based RL algorithms for continuous state space and finite action space
At the beginning, if I have a complete model $p(s' \mid s, a)$ (an assumed true model that describes the environment well enough) and the reward function $r(s,a,s')$. How can I exploit the model and learn a good policy in this situation? Assume that…

k2pctdn
- 35
- 5
2
votes
1 answer
If we can model the environment, wouldn't be meaningless to use a model-free algorithm?
I am trying to understand the concept of model-free and model-based approaches. As far as I understand, having a model of the environment does not mean that an RL agent has to be model-based. It is about the policy. However, if we can model the…

Ayska
- 23
- 4
2
votes
0 answers
What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?
In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods.
AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does…

user781486
- 201
- 1
- 5