4

Some RL literature use terms such as: 'Bellman backup' and 'Bellman error'. What do these terms refer to?

nbro
  • 39,006
  • 12
  • 98
  • 176
user529295
  • 359
  • 1
  • 10
  • There's already an answer that addresses both concerns/questions, but, please, next time, focus on one question per post, although, in this case, the terms are highly related (but I still think these "simple" questions could have been asked in separate posts). It may also be a good idea to provide more context (e.g. a link to an article that mentions these terms), although, again, in this case, anyone familiar with RL would be able to understand the question. – nbro Jun 28 '21 at 13:15

1 Answers1

3

A Bellman backup is an application of a Bellman operator. For example, the step

$$ V(x)\leftarrow \alpha(R + \mathbf{E}[V(x')]) + (1-\alpha)V(x) $$

Is a Bellman backup for some learning rate $\alpha$.

A Bellman error is

$$ d(V(x), R + \mathbf{E}[V(x')]) $$

for some metric $d$, usually $d(x, y) = (x-y)^2$.

harwiltz
  • 1,091
  • 1
  • 6
  • 6
  • What does 'backup' refer to here? – user529295 Jun 28 '21 at 12:55
  • 2
    It refers to propagating information from later states to earlier ones (backward in time sorta) – harwiltz Jun 28 '21 at 13:02
  • 2
    It may be a good idea to 1. also provide the [figures of the backups e.g. that you can find in Sutton and Barto's book](http://incompleteideas.net/book/first/figures/figures.html), 2. to link the OP to [this question](https://ai.stackexchange.com/q/11057/2444) about what the Bellman operator is and 3. explain the symbols in your answer. – nbro Jun 28 '21 at 13:12
  • @nbro Thanks for the references. – user529295 Jun 28 '21 at 13:18