3

In Sutton and Barto's Book in chapter 12, they state that if weights sum to 1, then an equation's updates have "guaranteed convergence properties". Actually why it ensures convergence?

There is a full citation from the mentioned fragment in Richard S. Sutton and Andrew G. Barto. Second Edition:

Now we note that a valid update can be done not just toward any n-step return, but toward any average of n-step returns for different ns. For example, an update can be done toward a target that is half of a two-step return and half of a four-step return: $\frac{1}{2}G_{t:t+2} + \frac{1}{2}G_{t:t+4}$. Any set of n-step returns can be averaged in this way, even an infinite set, as long as the weights on the component returns are positive and sum to 1. The composite return possesses an error reduction property similar to that of individual n-step returns (7.3) and thus can be used to construct updates with guaranteed convergence properties.

Daniel Wiczew
  • 323
  • 2
  • 10

0 Answers0