Highest Voted 'bellman-equations' Questions - Artificial Intelligence Stack Exchange

27

votes

1 answer

What is the Bellman operator in reinforcement learning?

In mathematics, the word operator can refer to several distinct but related concepts. An operator can be defined as a function between two vector spaces, it can be defined as a function where the domain and the codomain are the same, or it can be…

asked Mar 06 '19 at 14:07

nbro

39,006
12
98
176

9

votes

2 answers

Why are the Bellman operators contractions?

In these slides, it is written \begin{align} \left\|T^{\pi} V-T^{\pi} U\right\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{9} \label{9} \\ \|T V-T U\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{10} \label{10} \end{align} where $F$ is the space of…

reinforcement-learning proofs policy-iteration bellman-equations bellman-operators

asked Jul 31 '20 at 02:48

kevin

191
1
4

8

votes

1 answer

How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?

I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard. However, I have problems understanding how one should come up with this kind of…

reinforcement-learning q-learning dqn objective-functions bellman-equations

asked Dec 09 '20 at 18:28

Yves Boutellier

183
6

8

votes

2 answers

What is the proof that policy evaluation converges to the optimal solution?

Although I know how the algorithm of iterative policy evaluation using dynamic programming works, I am having a hard time realizing how it actually converges. It appeals to intuition that, with each iteration, we get a better and better…

reinforcement-learning reference-request proofs bellman-equations policy-evaluation

asked Apr 16 '20 at 06:44

SAGALPREET SINGH

147
1
6

7

votes

1 answer

Why do Bellman equations indirectly create a policy?

I was watching a lecture on policy gradients and Bellman equations. And they say that a Bellman equation indirectly creates a policy, while the policy gradient directly learns a policy. Why is this?

reinforcement-learning comparison policy-gradients bellman-equations

asked Dec 18 '17 at 13:27

echo

673
1
5
12

7

votes

0 answers

Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

It is proved that the Bellman update is a contraction (1). Here is the Bellman update that is used for Q-Learning: $$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s', a^*)) - Q_t(s,a)) \tag{1} \label{1}$$ The proof…

reinforcement-learning q-learning proofs convergence bellman-equations

asked Jul 23 '20 at 17:32

sirfroggy

71
3

7

votes

2 answers

Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?

I often see that the state-action value function is expressed as: $$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$ Why does expressing the…

reinforcement-learning value-functions bellman-equations expectation

asked Jun 06 '20 at 08:55

Daniel Wiczew

323
2
10

6

votes

2 answers

What is the Bellman Equation actually telling?

What does the Bellman equation actually say? And are there many flavours of that? I get a little confused when I look for the Bellman equation, because I feel like people are telling slightly different things about what it is. And I think the…

reinforcement-learning definitions value-functions bellman-equations

asked Dec 20 '20 at 21:49

Johnny

69
3

5

votes

1 answer

How would I compute the optimal state-action value for a certain state and action?

I am currently trying to learn reinforcement learning and I started with the basic gridworld application. I tried Q-learning with the following parameters: Learning rate = 0.1 Discount factor = 0.95 Exploration rate = 0.1 Default reward = 0 The…

reinforcement-learning q-learning value-functions bellman-equations

asked Feb 21 '21 at 18:56

Rim Sleimi

215
1
6

4

votes

1 answer

In reinforcement learning, does the optimal value correspond to performing the best action in a given state?

I am confused about the definition of the optimal value ($V^*$) and optimal action-value (Q*) in reinforcement learning, so I need some clarification, because some blogs I read on Medium and GitHub are inconsistent with the literature. Originally, I…

reinforcement-learning definitions value-functions bellman-equations

asked Aug 24 '18 at 20:21

Rui Nian

423
3
13

4

votes

2 answers

How do we get the optimal value-function?

In here it says that: (is it correct?) $$V^\pi = \sum_{a \in A}\pi(a|s)*Q^\pi(s,a)$$ And we have: $$ V^*(s) = max_\pi V^\pi(s)$$ Also: $$ V^*(s) = max_a Q^*(s, a) $$ Can someone demonstrate to me step by step how we got from $ V^*(s) = max_\pi…

reinforcement-learning value-functions bellman-equations

asked May 22 '23 at 16:26

Ness

216
1
8

4

votes

1 answer

What do the terms 'Bellman backup' and 'Bellman error' mean?

Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?

reinforcement-learning terminology bellman-equations bellman-operators

asked Jun 28 '21 at 11:52

user529295

359
1
10

4

votes

1 answer

How to prove the second form of Bellman's equation?

I'd like to prove this "second form" of Bellman's equation: $v(s) = \mathbb{E}[R_{t + 1} + \gamma v(S_{t+1}) \mid S_{t} = s]$ starting from Bellman's equation: $v(s) = \mathbb{E}[G_{t} \mid S_{t} = s]$ where the return $G_{t}$ is defined as follows:…

reinforcement-learning proofs value-functions bellman-equations

asked Jun 11 '21 at 15:11

Daviiid

563
3
15

4

votes

1 answer

How are afterstate value functions mathematically defined?

In this answer, afterstate value functions are mentioned, and that temporal-difference (TD) and Monte Carlo (MC) methods can also use these value functions. Mathematically, how are these value functions defined? Yes, they are a function of the next…

reinforcement-learning value-functions bellman-equations

asked Nov 23 '20 at 13:48

nbro

39,006
12
98
176

4

votes

1 answer

Why doesn't value iteration use $\pi(a \mid s)$ while policy evaluation does?

I was looking at the Bellman equation, and I noticed a difference between the equations used in policy evaluation and value iteration. In policy evaluation, there was the presence of $\pi(a \mid s)$, which indicates the probability of choosing…

reinforcement-learning policies value-iteration policy-iteration bellman-equations

asked Aug 25 '20 at 12:35

Chukwudi Ogbonna

125
4

Questions tagged [bellman-equations]