Questions tagged [dyna]

For questions related to the reinforcement learning "dyna" architecture.

For more info, have a look e.g. at https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node29.html.

5 questions

vote

1 answer

Without planning, why does each episode only add one additional step to the policy?

In Sutton & Barto's RL book at page 165 for Example 8.1, they say: Figure 8.3 shows why the planning agents found the solution so much faster than the nonplanning agent. Shown are the policies found by the n = 0 and n = 50 agents halfway through…

reinforcement-learning sutton-barto dyna

asked Mar 20 '22 at 08:32

DSPinfinity

vote

1 answer

If $\alpha$ decreases over time, why is Q-learning guaranteed to converge?

Q-Learning is guaranteed to converge if $\alpha$ decreases over time. On page 161 of the RL book by Sutton and Barto, 2nd edition, section 8.1, they write that Dyna-Q is guaranteed to converge if each action-state pair is selected an infinite number…

reinforcement-learning q-learning convergence learning-rate dyna

asked Dec 04 '20 at 08:26

user8714896

vote

1 answer

How is trajectory sampling different than normal (importance) sampling in reinforcement learning?

I am using Sutton and Barto's book for Reinforcement Learning. In Chapter 8, I am having difficulty in understanding the Trajectory Sampling. I have read the particular section on trajectory sampling (Sec 8.6) two times (plus 3rd time partially) but…

reinforcement-learning comparison importance-sampling dyna

asked Jun 28 '20 at 14:10

SJa

vote

0 answers

How do I know if the assumption of a static environment is made?

An important property of a reinforcement learning problem is whether the environment of the agent is static, which means that nothing changes if the agent remains inactive. Different learning methods assume in varying degrees that the environment is…

reinforcement-learning monte-carlo-methods temporal-difference-methods environment dyna

asked Jun 17 '19 at 18:51

maven

vote

0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…

reinforcement-learning model-based-methods prioritized-sweeping eligibility-traces dyna

asked Jan 22 '19 at 22:16

Amin