Highest Voted 'dynamic-programming' Questions - Artificial Intelligence Stack Exchange

10

votes

3 answers

What algorithms are considered reinforcement learning algorithms?

What are the areas/algorithms that belong to reinforcement learning? TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it? Are the dynamic programming…

asked Apr 30 '19 at 14:26

Miguel Saraiva

767
1
5
14

5

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

reinforcement-learning monte-carlo-methods temporal-difference-methods markov-property dynamic-programming

asked Aug 07 '20 at 05:19

stoic-santiago

1,121
5
18

4

votes

1 answer

Why is update rule of the value function different in policy evaluation and policy iteration?

In the textbook "Reinforcement Learning: An Introduction", by Richard Sutton and Andrew Barto, the pseudo code for Policy Evaluation is given as follows: The update equation for $V(s)$ comes from the Bellman equation for $v_{\pi}(s)$ which is…

reinforcement-learning comparison policy-iteration policy-evaluation dynamic-programming

asked May 19 '20 at 06:08

Nishanth Rao

147
6

3

votes

1 answer

Are policy and value iteration used only in grid world like scenarios?

I am trying to self learn reinforcement learning. At the moment I am focusing on policy and value iteration, and I am finding several problems and doubts. One of the main doubts is given by the fact that I can't find many diversified examples on how…

reinforcement-learning value-iteration policy-iteration dynamic-programming

asked Jun 15 '21 at 10:39

dcr

57
5

3

votes

1 answer

What are the common techniques one could use to deal with collisions in a transposition table?

Consider an iterative deepening search using a transposition table. Whenever the transposition table is full, what are common strategies applied to replace entries in the table? I'm aware of two strategies: Replace the oldest entry whenever a new…

search dynamic-programming iddfs

asked Mar 22 '19 at 21:54

ihavenoidea

255
2
11

2

votes

1 answer

Do we need the transition probability function when calculating the importance sampling ratio?

I am reading the book titled "Reinforcement Learning: An Introduction" (by Sutton and Barto). I am at chapter 5, which is about Monte Carlo methods, but now I am quite confused. There is one thing I don't particularly understand. Why do we need the…

reinforcement-learning on-policy-methods monte-carlo-methods importance-sampling dynamic-programming

asked Nov 16 '18 at 13:02

Manuel Pasieka

23
4

2

votes

2 answers

Should we feed a greater fraction of terminal states to the value network so that their values are learned first?

The basis of Q-learning is recursive (similar to dynamic programming), where only the absolute value of the terminal state is known. Shouldn't it make sense to feed the model a greater proportion of terminal states initially, to ensure that the…

reinforcement-learning q-learning markov-decision-process dynamic-programming

asked May 02 '18 at 08:08

sloth

565
1
5
6

2

votes

0 answers

What trait of a planning problem makes reinforcement learning a well suited solution?

Planning problems have been the first problems studied at the dawn of AI (Shakey the robot). Graph search (e.g. A*) and planning (e.g. GraphPlan) algorithms can be very efficient at generating a plan. As for problem formulation, for planning…

reinforcement-learning markov-decision-process planning dynamic-programming

asked Feb 08 '21 at 13:57

50k4

225
1
8

2

votes

1 answer

Is it possible to retrieve the optimal policy from the state value function?

One can easily retrieve the optimal policy from the action value function but how about obtaining it from the state value function?

reinforcement-learning dynamic-programming

asked Nov 07 '20 at 01:46

Mika

331
1
8

1

vote

1 answer

Interpretation of the Dynamic Time Warping (DTW) graph

How can I interpret ate the DTW graph. I understood the algorithm behind DTW, but I struggle to interpret ate the graph. When I compute the DTW for a signal that is the same signal but shifted in time, my result will be a diagonal, straight line…

math clustering dynamic-programming

asked Oct 19 '22 at 13:20

Skobo Do

51
6

1

vote

0 answers

Structured policies in dynamic programming: solving a toy example

I am trying to solve a dynamic programming toy example. Here is the prompt: imagine you arrive in a new city for $N$ days and every night need to pick a restaurant to get dinner at. The qualities of the restaurants are iid according to distribution…

optimization dynamic-programming

asked Apr 21 '22 at 22:06

learningowl

11
2

1

vote

0 answers

Is there a way use DQN to find the optimal combination of actions (control inputs) and environment parameters?

I am using DQN to find the optimal sequence of control inputs to a dynamic system. The setup is as follows: At the beginning of each episode, the system is initialized to the SAME initial condition s0. Each episode spans from 0 seconds to tf…

reinforcement-learning dqn action-spaces dynamic-programming

asked Mar 30 '22 at 18:45

David Braun

11
1

1

vote

1 answer

How are these two versions of the Bellman optimality equation related?

I saw two versions of the optimality equation for $V_{*}(s)$ and $Q_{*}(s,a)$. The first one is: $$ V_{*}(s)=\max _{a} \sum_{s^{\prime}} P_{s s^{\prime}}^{a}\left(r(s, a)+\gamma V_{*}\left(s^{\prime}\right)\right) $$ and $$ Q_{*}(s,…

reinforcement-learning reward-functions bellman-equations dynamic-programming

asked Sep 10 '21 at 00:40

Kronic

49
3

1

vote

1 answer

Is there a notion of exploration-exploitation tradeoff in dynamic programming (or model-based RL)?

model-based-methods exploration-exploitation-tradeoff dynamic-programming

asked Jul 26 '21 at 15:05

user529295

359
1
10

1

vote

1 answer

How do we get from conditional expectation on both state and action to only state in the proof of the Policy Improvement Theorem?

I'm going through Sutton and Barto's book Reinforcement Learning: An Introduction and I'm trying to understand the proof of the Policy Improvement Theorem, presented at page 78 of the physical book. The theorem goes as follows: Let $\pi$ and $\pi'$…

reinforcement-learning dynamic-programming policy-improvement policy-improvement-theorem

asked Jun 22 '21 at 15:12

Daviiid

563
3
15

Questions tagged [dynamic-programming]