Questions tagged [dyna]

For questions related to the reinforcement learning "dyna" architecture.

For more info, have a look e.g. at https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node29.html.

5 questions
1
vote
1 answer

Without planning, why does each episode only add one additional step to the policy?

In Sutton & Barto's RL book at page 165 for Example 8.1, they say: Figure 8.3 shows why the planning agents found the solution so much faster than the nonplanning agent. Shown are the policies found by the n = 0 and n = 50 agents halfway through…
DSPinfinity
  • 301
  • 1
  • 8
1
vote
1 answer

If $\alpha$ decreases over time, why is Q-learning guaranteed to converge?

Q-Learning is guaranteed to converge if $\alpha$ decreases over time. On page 161 of the RL book by Sutton and Barto, 2nd edition, section 8.1, they write that Dyna-Q is guaranteed to converge if each action-state pair is selected an infinite number…
1
vote
1 answer

How is trajectory sampling different than normal (importance) sampling in reinforcement learning?

I am using Sutton and Barto's book for Reinforcement Learning. In Chapter 8, I am having difficulty in understanding the Trajectory Sampling. I have read the particular section on trajectory sampling (Sec 8.6) two times (plus 3rd time partially) but…
SJa
  • 371
  • 2
  • 15
1
vote
0 answers

How do I know if the assumption of a static environment is made?

An important property of a reinforcement learning problem is whether the environment of the agent is static, which means that nothing changes if the agent remains inactive. Different learning methods assume in varying degrees that the environment is…
1
vote
0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…