Questions tagged [q-learning]

For questions related to the Q-learning algorithm, which is a model-free and temporal-difference reinforcement learning algorithm that attempts to approximate the Q function, which is a function that, given a state s and an action a, returns a real number that represents the return (or value) of state s when action a is taken from s. Q-learning was introduced in the PhD thesis "Learning from Delayed Rewards" (1989) by Watkins.

For more info, see e.g. the book Reinforcement Learning: An Introduction (2nd edition) by Sutton and Barto. See also the related Wikipedia article or e.g. http://artint.info/html/ArtInt_265.html

369 questions
44
votes
2 answers

What is the relation between Q-learning and policy gradients methods?

As far as I understand, Q-learning and policy gradients (PG) are the two major approaches used to solve RL problems. While Q-learning aims to predict the reward of a certain action taken in a certain state, policy gradients directly predict the…
22
votes
3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…
nbro
  • 39,006
  • 12
  • 98
  • 176
18
votes
2 answers

Can Q-learning be used for continuous (state or action) spaces?

Many examples work with a table-based method for Q-learning. This may be suitable for a discrete state (observation) or action space, like a robot in a grid world, but is there a way to use Q-learning for continuous spaces like the control of a…
17
votes
1 answer

Why does DQN require two different networks?

I was going through this implementation of DQN and I see that on line 124 and 125 two different Q networks have been initialized. From my understanding, I think one network predicts the appropriate action and the second network predicts the target Q…
16
votes
2 answers

What is the difference between Q-learning, Deep Q-learning and Deep Q-network?

Q-learning uses a table to store all state-action pairs. Q-learning is a model-free RL algorithm, so how could there be the one called Deep Q-learning, as deep means using DNN; or maybe the state-action table (Q-table) is still there but the DNN is…
Dee
  • 1,283
  • 1
  • 11
  • 35
11
votes
2 answers

How do we prove the n-step return error reduction property?

In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property": But they…
11
votes
1 answer

What exactly is the advantage of double DQN over DQN?

I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values. However, I don't understand why…
Chukwudi
  • 349
  • 2
  • 7
11
votes
1 answer

Are Q-learning and SARSA the same when action selection is greedy?

I'm currently studying reinforcement learning and I'm having difficulties with question 6.12 in Sutton and Barto's book. Suppose action selection is greedy. Is Q-learning then exactly the same algorithm as SARSA? Will they make exactly the same…
10
votes
1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…
9
votes
1 answer

How does Q-learning work in stochastic environments?

The Q function uses the (current and future) states to determine the action that gets the highest reward. However, in a stochastic environment, the current action (at the current state) does not determine the next state. How does Q learning handle…
redlum
  • 91
  • 1
  • 2
8
votes
1 answer

What are other ways of handling invalid actions in scenarios where all rewards are either 0 (best reward) or negative?

I created an OpenAI Gym environment, and I would like to check the performance of the agent from OpenAI Baselines DQN approach on it. In my environment, the best possible outcome for the agent is 0 - the robot needs zero non-necessary resources to…
8
votes
1 answer

How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?

I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard. However, I have problems understanding how one should come up with this kind of…
8
votes
2 answers

What are some online courses for deep reinforcement learning?

What are some (good) online courses for deep reinforcement learning? I would like the course to be both programming and theoretical. I really liked David Silver's course, but the course dates from 2015. It doesn't really teach deep Q-learning at…
8
votes
1 answer

Does AlphaZero use Q-Learning?

I was reading the AlphaZero paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, and it seems they don't mention Q-Learning anywhere. So does AZ use Q-Learning on the results of self-play or just a Supervised…
8
votes
2 answers

Can DQN perform better than Double DQN?

I'm training both DQN and double DQN in the same environment, but DQN performs significantly better than double DQN. As I've seen in the double DQN paper, double DQN should perform better than DQN. Am I doing something wrong or is it possible?
Angelo
  • 201
  • 2
  • 16
1
2 3
24 25