2

I'm not having a lot of intuition about the equation. I have this Bellman update rule:

vπ(s)=aπ(a|s)s,rp(s,r|s,a)[r+γvk(s)]

But where are the parenthesis? Is the second sum using the index $a$ from the first sum? Or is it independent, and can I move out the $[r+ \gamma v_{k}(s')]$ term out of the sum?

nbro
  • 39,006
  • 12
  • 98
  • 176
nammerkage
  • 206
  • 1
  • 7

1 Answers1

3

Here's your equation with an additional couple of parenthesis that emphasizes the order of the operations (note that you had a small typo in your original equation).

vπ(s)=aπ(as)(s,rp(s,rs,a)[r+γvπ(s)])

Now, let me answer your other questions.

Is the second sum using the index $a$ from the first sum?

Yes.

Or is it independent, and can I move out the $[r+ \gamma v_\pi(s')]$ term out of the sum?

No, and you cannot move this term out of the sum because the second sum is a sum over $r$ and $s'$ and $r+ \gamma v_\pi(s')$ depends on those terms.

Note that $v_{\pi}(s)$ is defined as an expectation and that $\pi(a \mid s)$ (the policy) and $p(s',r \mid s,a)$ (the model) are probability distributions.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 1
    Worth noting this is a common convention when concatenating $\sum$ terms, that they are nested with implied parens as you show. I'm sure there will be some discipline that does not do that, but all the ML I have read does this. – Neil Slater Jan 18 '22 at 09:00
  • 1
    I'm sorry, I'm coming back and got confused, shouldn't be like this: vπ(s)=a(π(as)s,rp(s,rs,a)[r+γvπ(s)])
    Since the second sum uses the same $a$ index.. Putting the indexes the way @nbro does implies the sums are independent
    – nammerkage Jan 18 '22 at 09:35
  • 1
    @nammerkage You can also write that **and** with the parentheses that I show in my answer. Both are ok. Parentheses here are used just to emphasize the precedence of the operations. So, you can also write vπ(s)=a(π(as)(s,rp(s,rs,a)[r+γvπ(s)]))
    – nbro Jan 18 '22 at 11:47