Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?
Asked
Active
Viewed 1,941 times
4

nbro
- 39,006
- 12
- 98
- 176

user529295
- 359
- 1
- 10
-
There's already an answer that addresses both concerns/questions, but, please, next time, focus on one question per post, although, in this case, the terms are highly related (but I still think these "simple" questions could have been asked in separate posts). It may also be a good idea to provide more context (e.g. a link to an article that mentions these terms), although, again, in this case, anyone familiar with RL would be able to understand the question. – nbro Jun 28 '21 at 13:15
1 Answers
3
A Bellman backup is an application of a Bellman operator. For example, the step
$$ V(x)\leftarrow \alpha(R + \mathbf{E}[V(x')]) + (1-\alpha)V(x) $$
Is a Bellman backup for some learning rate $\alpha$.
A Bellman error is
$$ d(V(x), R + \mathbf{E}[V(x')]) $$
for some metric $d$, usually $d(x, y) = (x-y)^2$.

harwiltz
- 1,091
- 1
- 6
- 6
-
-
2It refers to propagating information from later states to earlier ones (backward in time sorta) – harwiltz Jun 28 '21 at 13:02
-
2It may be a good idea to 1. also provide the [figures of the backups e.g. that you can find in Sutton and Barto's book](http://incompleteideas.net/book/first/figures/figures.html), 2. to link the OP to [this question](https://ai.stackexchange.com/q/11057/2444) about what the Bellman operator is and 3. explain the symbols in your answer. – nbro Jun 28 '21 at 13:12
-