3

Excercise 3.5 The equastions in Section 3.1 are for the continuing case and need to be modified (very slightly) to apply to episodic tasks. Show that you know the modifications needed by giving the modified version of (3.3).

$\displaystyle\sum_{s^{\prime} \in S} \displaystyle\sum_{r \in R} = p(s^{\prime}, r | s,a) = 1$ , for all $s\in S, a \in A(s)$ (3.3)

Is it just about final states? So for $s \in S$ when S is not final?

Jakub Bielan
  • 165
  • 4

2 Answers2

2

Is it just about final states? So for $s \in S$ when S is not final?

You are thinking the right way, but to represent what you mean you don't need to write out "when $s$ is not final" - although that would be fine (and is used in some places), there is a more concise way of saying that given to you by the book.

As this is a formal exercise from the book, I don't want to write out an answer that could be cut&paste for all students.

Instead I suggest you take a look at the notations section at the beginning of the book, and find how Sutton & Barto use different set labels for all states including terminal states, and all states excluding terminal states. Also, check carefully which of those sets needs to be summed over.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • Thank you! I guess you mean $S$ and $S^+$, so $s \in S$ already means it's not final. Still I'm not sure what's the difference between continuing and episodic cases in the context of this equation. – Jakub Bielan Mar 07 '19 at 16:57
  • @JakubBielan: Look at the sums – Neil Slater Mar 07 '19 at 16:57
  • I think I got it :) When I'll be able to give acknowledgement for your answer, I'll remember to do it for sure – Jakub Bielan Mar 07 '19 at 17:06
  • @NeilSlater hey I wanted to clarify a doubt..I tagged you in the chat but probably you weren't notified...Can you check 'the singularity' and clarify my doubt? –  Mar 08 '19 at 04:30
0

I found myself turning in cycles for a while, so to clarify Neil Slater's answer,

In the beginning of the book, $S$ means "set of non-terminal states" and $S^+$ means "set of all states, including the terminal ones".

$$\sum_{s^{\prime} \in S} \sum_{r \in R} p(s^{\prime}, r | s,a) = 1, \forall s \in S, a \in A(s) \tag{3.3}$$

That said, in eq. 3.3 when we define that $\forall s \in S$, we say that that once in a terminal state, the formula does not apply (which is obvious because no action is ever available in a terminal state by definition).

It does not however constraint the probability in how to "get" in a terminal state, and that is the key to answer the question.

Philip Raeisghasem
  • 2,028
  • 9
  • 29
Gigi
  • 111
  • 3