Questions tagged [value-based-methods]

For questions about value-based reinforcement learning (RL) methods (or algorithms), which first learn a value function and then derive the policy from it. An example of a value-based RL algorithm is Q-learning.

10 questions
5
votes
1 answer

Is reinforcement learning only about determining the value function?

I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function). Are there any…
4
votes
1 answer

Why are policy gradient methods more effective in high-dimensional action spaces?

David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…
3
votes
1 answer

What is the advantage of using MCTS with value based methods over value based methods only?

I have been trying to understand why MCTS is very important to the performance of RL agents, and the best description I found was from the paper Bootstrapping from Game Tree Search stating: Deterministic, two-player games such as chess provide an…
3
votes
1 answer

Is it possible for value-based methods to learn stochastic policies?

Is it possible for value-based methods to learn stochastic policies? I'm trying to get a clear picture of the different categories for RL algorithms, and while doing so I started to think about settings where the optimal policy is stochastic…
2
votes
0 answers

What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?

In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods. AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does…
1
vote
0 answers

Is it possible to combine two policy-based RL agents?

I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one…
1
vote
1 answer

Why do we need to have two heads in D3QN to obtain value and advantage separately, if V is the average of Q values?

I have two questions on the Dueling DQN paper. First, I have an issue on understanding the identifiability that Dueling DQN paper mentions: Here is my question: If we have given Q-values $Q(s, a; \theta)$ for all actions, I assume we can get value…
1
vote
0 answers

What are the disadvantages of actor-only methods with respect to value-based ones?

While the advantages of actor-only algorithms, the ones that search directly the policy without the use of the value function, are clear (possibility of having a continuous action space, a stochastic policy, etc.), I can't figure out the…
1
vote
0 answers

Are policy-based methods better than value-based methods only for large action spaces?

In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9…
0
votes
0 answers

As someone starting out in RL, could you help me understand the differences between actor-only, critic-only, and actor-critic methods?

I have been reading some medium articles and these three methods pop up a lot. I am wondering what the differences between these are, what are the advantages of one over the other, etc. Also from my understanding, actor-only method is synonymous to…