For questions about value-based reinforcement learning (RL) methods (or algorithms), which first learn a value function and then derive the policy from it. An example of a value-based RL algorithm is Q-learning.
Questions tagged [value-based-methods]
10 questions
5
votes
1 answer
Is reinforcement learning only about determining the value function?
I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function).
Are there any…

Felix P.
- 287
- 1
- 6
4
votes
1 answer
Why are policy gradient methods more effective in high-dimensional action spaces?
David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…

Saucy Goat
- 143
- 4
3
votes
1 answer
What is the advantage of using MCTS with value based methods over value based methods only?
I have been trying to understand why MCTS is very important to the performance of RL agents, and the best description I found was from the paper Bootstrapping from Game Tree Search stating:
Deterministic, two-player games such as chess provide an…

Hossam
- 33
- 3
3
votes
1 answer
Is it possible for value-based methods to learn stochastic policies?
Is it possible for value-based methods to learn stochastic policies? I'm trying to get a clear picture of the different categories for RL algorithms, and while doing so I started to think about settings where the optimal policy is stochastic…

Krrrl
- 211
- 1
- 10
2
votes
0 answers
What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?
In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods.
AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does…

user781486
- 201
- 1
- 5
1
vote
0 answers
Is it possible to combine two policy-based RL agents?
I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one…

BlackBrain
- 111
- 2
1
vote
1 answer
Why do we need to have two heads in D3QN to obtain value and advantage separately, if V is the average of Q values?
I have two questions on the Dueling DQN paper. First, I have an issue on understanding the identifiability that Dueling DQN paper mentions:
Here is my question: If we have given Q-values $Q(s, a; \theta)$ for all actions, I assume we can get value…

Afshin Oroojlooy
- 175
- 1
- 7
1
vote
0 answers
What are the disadvantages of actor-only methods with respect to value-based ones?
While the advantages of actor-only algorithms, the ones that search directly the policy without the use of the value function, are clear (possibility of having a continuous action space, a stochastic policy, etc.), I can't figure out the…

unter_983
- 331
- 1
- 6
1
vote
0 answers
Are policy-based methods better than value-based methods only for large action spaces?
In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9…

tmaric
- 382
- 2
- 8
0
votes
0 answers
As someone starting out in RL, could you help me understand the differences between actor-only, critic-only, and actor-critic methods?
I have been reading some medium articles and these three methods pop up a lot. I am wondering what the differences between these are, what are the advantages of one over the other, etc. Also from my understanding, actor-only method is synonymous to…

No-Time-To-Day
- 101
- 2