For questions related to the mathematical concept of "expectation" or "expected value".
Questions tagged [expectation]
25 questions
7
votes
2 answers
Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?
I often see that the state-action value function is expressed as:
$$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$
Why does expressing the…

Daniel Wiczew
- 323
- 2
- 10
5
votes
1 answer
What does the argmax of the expectation of the log likelihood mean?
What does the following equation mean? What does each part of the formula represent or mean?
$$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$

arash moradi
- 181
- 1
- 6
5
votes
1 answer
Why is the mean used to compute the expectation in the GAN loss?
From Goodfellow et al. (2014), we have the adversarial loss:
$$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$
In practice, the expectation is…

A is for Ambition
- 153
- 4
5
votes
1 answer
If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?
I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition)
Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…

tmaric
- 382
- 2
- 8
4
votes
1 answer
$E_{\pi}[R_{t+1}|S_t=s,A_t=a] = E[R_{t+1}|S_t=s,A_t=a]$?
I would like to solve the first question of Exercise 3.19 from Sutton and Barto:
Exercise 3.19 The value of an action, $q_{\pi}(s, a)$, depends on the expected next reward and
the expected sum of the remaining rewards. Again we can think of this in…

user
- 145
- 9
4
votes
1 answer
How is the state-value function expressed as a product of sums?
The state-value function for a given policy $\pi$ is given by
$$\begin{align}
V^{\pi}(s) &=E_{\pi}\left\{r_{t+1}+\gamma r_{t+2}+\gamma^{2} r_{t+3}+\cdots \mid s_{t}=s\right\} \\
&=E_{\pi}\left\{r_{t+1}+\gamma V^{\pi}\left(s_{t+1}\right) \mid…

hanugm
- 3,571
- 3
- 18
- 50
4
votes
2 answers
Why is $G_{t+1}$ is replaced with $v_*(S_{t+1})$ in the Bellman optimality equation?
In equation 3.17 of Sutton and Barto's book:
$$q_*(s, a)=\mathbb{E}[R_{t+1} + \gamma v_*(S_{t+1}) \mid S_t = s, A_t = a]$$
$G_{t+1}$ here have been replaced with $v_*(S_{t+1})$, but no reason has been provided for why this step has been taken.
Can…

ZERO NULLS
- 147
- 8
4
votes
2 answers
What is the difference between return and expected return?
At a time step $t$, for a state $S_{t}$, the return is defined as the discounted cumulative reward from that time step $t$.
If an agent is following a policy (which in itself is a probability distribution of choosing a next state $S_{t+1}$ from…

digi philos
- 41
- 1
3
votes
1 answer
What is wrong with equation 7.3 in Sutton & Barto's book?
Equation 7.3 of Sutton Barto book:
$$\text{Equation: } max_s|\mathbb{E}_\pi[G_{t:t+n}|S_t = s] - v_\pi| \le \gamma^nmax_s|V_{t+n-1}(s) - v_\pi(s)| $$
$$\text{where }G_{t:t+n} = R_{t+1} + \gamma R_{t+2} + .....+\gamma^{n-1} R_{t+n} +…

ZERO NULLS
- 147
- 8
3
votes
1 answer
How does $\mathbb{E}$ suddenly change to $\mathbb{E}_{\pi'}$ in this equation?
In Sutton-Barto's book on page 63 (81 of the pdf):
$$\mathbb{E}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t=s,A_t=\pi'(s)] = \mathbb{E}_{\pi'}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_{t} = s]$$
How does $\mathbb{E}$ suddenly change to…

ZERO NULLS
- 147
- 8
3
votes
1 answer
What is meant by the expected BLEU cost when training with BLEU and SIMILE?
Recently I was reading a paper based on a new evaluation metric SIMILE. In a section, validation loss comparison had been made for SIMILE and BLEU. The plot showed the expected BLEU cost when training with BLEU and SIMILE.
What I'm unable to…

develop97
- 31
- 2
3
votes
1 answer
Shouldn't expected return be calculated for some faraway time in the future $t+n$ instead of current time $t$?
I am learning RL for the first time. It may be naive, but it is a bit odd to grasp this idea that, if the goal of RL is to maximize the expected return, then shouldn't the expected return be calculated for some faraway time in the future ($t+n$)…

SJa
- 371
- 2
- 15
2
votes
2 answers
How is per-decision importance sampling derived in Sutton & Barto's book?
In per-decison importance sampling given in Sutton & Barto's book:
Eq 5.12 $\rho_{t:T-1}R_{t+k} =…

ZERO NULLS
- 147
- 8
2
votes
1 answer
Are these two definitions of the state-action value function equivalent?
I have been reading the Sutton and Barto textbook and going through David Silvers UCL lecture videos on YouTube and have a question on the equivalence of two forms of the state-action value function written in terms of the value function.
From…

David
- 4,591
- 1
- 6
- 25
2
votes
2 answers
Why is there an expectation sign in the Bellman equation?
In chapter 3.5 of Sutton's book, the value function is defined as:
Can someone give me some clarification about why there is the expectation sign behind the entire equation? Considering that the agent is following a fixed policy $\pi$, why there…

Jack
- 23
- 3