For questions about model-free reinforcement learning methods (or algorithms). An example of a model-free algorithm is Q-learning, which does not use the transition function (i.e. the model) of the environment (or Markov decision process).
Questions tagged [model-free-methods]
23 questions
86
votes
6 answers
What's the difference between model-free and model-based reinforcement learning?
What's the difference between model-free and model-based reinforcement learning?
It seems to me that any model-free learner, learning through trial and error, could be reframed as model-based. In that case, when would model-free learners be…

mynameisvinn
- 961
- 1
- 7
- 6
5
votes
2 answers
How can the policy iteration algorithm be model-free if it uses the transition probabilities?
I'm actually trying to understand the policy iteration in the context of RL. I read an article presenting it and, at some point, a pseudo-code of the algorithm is given :
What I can't understand is this line :
From what I understand, policy…

Samuel Beaussant
- 183
- 3
4
votes
1 answer
Why are state-values alone not sufficient in determining a policy (without a model)?
"If a model is not available, then it is particularly useful to estimate action values (the
values of state-action pairs) rather than state values. With a model, state values alone are
sufficient to determine a policy; one simply looks ahead one…

stoic-santiago
- 1,121
- 5
- 18
4
votes
1 answer
How does policy evaluation work for continuous state space model-free approaches?
How does policy evaluation work for continuous state space model-free approaches?
Theoretically, a model-based approach for the discrete state and action space can be computed via dynamic programming and solving the Bellman equation.
Let's say you…

calveeen
- 1,251
- 7
- 17
4
votes
1 answer
Is the minimax algorithm model-based?
Trying to get my head around model-free and model-based algorithms in RL. In my research, I've seen the search trees created via the minimax algorithm. I presume these trees can only be created with a model-based agent that knows the full…

mason7663
- 603
- 3
- 10
4
votes
1 answer
Why are model-based methods more sample efficient than model-free methods?
Why do model-based methods use fewer samples than model-free methods? Here, I'm specifically referring to model-based methods in which we have to learn a policy and model. I can only think of two reasons for this question:
We can potentially obtain…

Maybe
- 441
- 2
- 11
4
votes
1 answer
How do temporal-difference and Monte Carlo methods work, if they do not have access to model?
In value iteration, we have a model of the environment's dynamics, i.e $p(s', r \mid s, a)$, which we use to update an estimate of the value function.
In the case of temporal-difference and Monte Carlo methods, we do not use $p(s', r \mid s, a)$,…

strongguy122
- 41
- 1
3
votes
1 answer
Are model-free and off-policy algorithms the same?
In respect of RL, is model-free and off-policy the same thing, just different terminology? If not, what are the differences? I've read that the policy can be thought of as 'the brain', or decision making part, of machine learning application, where…

mason7663
- 603
- 3
- 10
2
votes
1 answer
How to prove importance sampling ratio is uncorrelated with action-value (or state-value) estimate?
In Sutton & Barto (2nd edition), the following is mentioned on page 150 (p. 172 of the pdf), section 7.4:
the importance sampling ratio has expected value one (Section 5.9) and is uncorrelated with the estimate.
How can we prove the importance…

user529295
- 359
- 1
- 10
2
votes
1 answer
If we can model the environment, wouldn't be meaningless to use a model-free algorithm?
I am trying to understand the concept of model-free and model-based approaches. As far as I understand, having a model of the environment does not mean that an RL agent has to be model-based. It is about the policy. However, if we can model the…

Ayska
- 23
- 4
2
votes
0 answers
What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?
In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods.
AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does…

user781486
- 201
- 1
- 5
2
votes
1 answer
Into which subcategories can reinforcement learning be divided?
In the course of a scientific work, I will discuss the different types of reinforcement learning. However, I have difficulties to find these different types.
So, into which subcategories can reinforcement learning be divided? For example, the…

jackless
- 23
- 3
2
votes
1 answer
What is the relation between Monte Carlo and model-free algorithms?
Monte Carlo (MC) methods are methods that use some form of randomness or sampling. For example, we can use an MC method to approximate the area of a circle inside a square: we generate random 2D points inside the square and count the number of…

nbro
- 39,006
- 12
- 98
- 176
1
vote
1 answer
How does one normalize observations in online reinforcement learning
I was wondering how would one normalize observations to a policy without knowing the upper and lower limits of the environment values. A trivial technique would be normalize each observation by its maximum value before inputting it into a policy.…

desert_ranger
- 586
- 3
- 19
1
vote
1 answer
In deep reinforcement learning, what is this model with state as input and value as output?
I was looking at this implementation for creating an agent for playing Tetris using DeepRL.
This model uses "a state based on the statistics of the board after a potential action. All predictions would be compared but the action with the best state…

JeanMi
- 155
- 4