Questions tagged [q-learning]

For questions related to the Q-learning algorithm, which is a model-free and temporal-difference reinforcement learning algorithm that attempts to approximate the Q function, which is a function that, given a state s and an action a, returns a real number that represents the return (or value) of state s when action a is taken from s. Q-learning was introduced in the PhD thesis "Learning from Delayed Rewards" (1989) by Watkins.

For more info, see e.g. the book Reinforcement Learning: An Introduction (2nd edition) by Sutton and Barto. See also the related Wikipedia article or e.g. http://artint.info/html/ArtInt_265.html

369 questions

votes

2 answers

What is the relation between Q-learning and policy gradients methods?

As far as I understand, Q-learning and policy gradients (PG) are the two major approaches used to solve RL problems. While Q-learning aims to predict the reward of a certain action taken in a certain state, policy gradients directly predict the…

asked Apr 28 '18 at 03:11

sloth

votes

3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…

reinforcement-learning q-learning deep-rl proofs function-approximation

asked Apr 05 '19 at 18:23

nbro

39,006
12
98
176

votes

2 answers

Can Q-learning be used for continuous (state or action) spaces?

Many examples work with a table-based method for Q-learning. This may be suitable for a discrete state (observation) or action space, like a robot in a grid world, but is there a way to use Q-learning for continuous spaces like the control of a…

reinforcement-learning q-learning dqn continuous-action-spaces continuous-state-spaces

asked May 11 '19 at 11:11

Bryan McGill

votes

1 answer

Why does DQN require two different networks?

I was going through this implementation of DQN and I see that on line 124 and 125 two different Q networks have been initialized. From my understanding, I think one network predicts the appropriate action and the second network predicts the target Q…

reinforcement-learning deep-rl q-learning dqn target-network

asked Jul 02 '18 at 07:47

amitection

votes

2 answers

What is the difference between Q-learning, Deep Q-learning and Deep Q-network?

Q-learning uses a table to store all state-action pairs. Q-learning is a model-free RL algorithm, so how could there be the one called Deep Q-learning, as deep means using DNN; or maybe the state-action table (Q-table) is still there but the DNN is…

reinforcement-learning comparison q-learning dqn deep-rl

asked Jan 22 '21 at 09:41

Dee

1,283
1
11
35

votes

2 answers

How do we prove the n-step return error reduction property?

In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property": But they…

reinforcement-learning q-learning math proofs sutton-barto

asked Dec 08 '18 at 05:24

123learn

votes

1 answer

What exactly is the advantage of double DQN over DQN?

I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values. However, I don't understand why…

comparison q-learning dqn deep-rl double-dqn

asked Jul 30 '20 at 19:40

Chukwudi

votes

1 answer

Are Q-learning and SARSA the same when action selection is greedy?

I'm currently studying reinforcement learning and I'm having difficulties with question 6.12 in Sutton and Barto's book. Suppose action selection is greedy. Is Q-learning then exactly the same algorithm as SARSA? Will they make exactly the same…

reinforcement-learning comparison q-learning sarsa greedy-policy

asked May 10 '20 at 10:52

hyuj

votes

1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…

reinforcement-learning q-learning pomdp markov-decision-process sarsa

asked Apr 03 '19 at 02:40

drerD

votes

1 answer

How does Q-learning work in stochastic environments?

The Q function uses the (current and future) states to determine the action that gets the highest reward. However, in a stochastic environment, the current action (at the current state) does not determine the next state. How does Q learning handle…

reinforcement-learning q-learning environment

asked Mar 29 '18 at 09:57

redlum

votes

1 answer

What are other ways of handling invalid actions in scenarios where all rewards are either 0 (best reward) or negative?

I created an OpenAI Gym environment, and I would like to check the performance of the agent from OpenAI Baselines DQN approach on it. In my environment, the best possible outcome for the agent is 0 - the robot needs zero non-necessary resources to…

reinforcement-learning q-learning dqn implementation reward-functions

asked May 29 '17 at 09:02

AlexGuevara

votes

1 answer

How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?

I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard. However, I have problems understanding how one should come up with this kind of…

reinforcement-learning q-learning dqn objective-functions bellman-equations

asked Dec 09 '20 at 18:28

Yves Boutellier

votes

2 answers

What are some online courses for deep reinforcement learning?

What are some (good) online courses for deep reinforcement learning? I would like the course to be both programming and theoretical. I really liked David Silver's course, but the course dates from 2015. It doesn't really teach deep Q-learning at…

reinforcement-learning q-learning dqn deep-rl resource-request

asked Mar 25 '20 at 14:46

J.Doe

votes

1 answer

Does AlphaZero use Q-Learning?

I was reading the AlphaZero paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, and it seems they don't mention Q-Learning anywhere. So does AZ use Q-Learning on the results of self-play or just a Supervised…

reinforcement-learning q-learning monte-carlo-tree-search supervised-learning alphazero

asked Jul 01 '19 at 17:02

Avetik

votes

2 answers

Can DQN perform better than Double DQN?

I'm training both DQN and double DQN in the same environment, but DQN performs significantly better than double DQN. As I've seen in the double DQN paper, double DQN should perform better than DQN. Am I doing something wrong or is it possible?

reinforcement-learning q-learning dqn deep-rl double-dqn

asked Apr 08 '19 at 09:08

Angelo

2 3

…

24 25 Next