Questions tagged [expectation]

For questions related to the mathematical concept of "expectation" or "expected value".

25 questions
7
votes
2 answers

Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?

I often see that the state-action value function is expressed as: $$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$ Why does expressing the…
5
votes
1 answer

What does the argmax of the expectation of the log likelihood mean?

What does the following equation mean? What does each part of the formula represent or mean? $$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$
5
votes
1 answer

Why is the mean used to compute the expectation in the GAN loss?

From Goodfellow et al. (2014), we have the adversarial loss: $$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$ In practice, the expectation is…
5
votes
1 answer

If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?

I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition) Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…
4
votes
1 answer

$E_{\pi}[R_{t+1}|S_t=s,A_t=a] = E[R_{t+1}|S_t=s,A_t=a]$?

I would like to solve the first question of Exercise 3.19 from Sutton and Barto: Exercise 3.19 The value of an action, $q_{\pi}(s, a)$, depends on the expected next reward and the expected sum of the remaining rewards. Again we can think of this in…
4
votes
1 answer

How is the state-value function expressed as a product of sums?

The state-value function for a given policy $\pi$ is given by $$\begin{align} V^{\pi}(s) &=E_{\pi}\left\{r_{t+1}+\gamma r_{t+2}+\gamma^{2} r_{t+3}+\cdots \mid s_{t}=s\right\} \\ &=E_{\pi}\left\{r_{t+1}+\gamma V^{\pi}\left(s_{t+1}\right) \mid…
hanugm
  • 3,571
  • 3
  • 18
  • 50
4
votes
2 answers

Why is $G_{t+1}$ is replaced with $v_*(S_{t+1})$ in the Bellman optimality equation?

In equation 3.17 of Sutton and Barto's book: $$q_*(s, a)=\mathbb{E}[R_{t+1} + \gamma v_*(S_{t+1}) \mid S_t = s, A_t = a]$$ $G_{t+1}$ here have been replaced with $v_*(S_{t+1})$, but no reason has been provided for why this step has been taken. Can…
4
votes
2 answers

What is the difference between return and expected return?

At a time step $t$, for a state $S_{t}$, the return is defined as the discounted cumulative reward from that time step $t$. If an agent is following a policy (which in itself is a probability distribution of choosing a next state $S_{t+1}$ from…
3
votes
1 answer

What is wrong with equation 7.3 in Sutton & Barto's book?

Equation 7.3 of Sutton Barto book: $$\text{Equation: } max_s|\mathbb{E}_\pi[G_{t:t+n}|S_t = s] - v_\pi| \le \gamma^nmax_s|V_{t+n-1}(s) - v_\pi(s)| $$ $$\text{where }G_{t:t+n} = R_{t+1} + \gamma R_{t+2} + .....+\gamma^{n-1} R_{t+n} +…
3
votes
1 answer

How does $\mathbb{E}$ suddenly change to $\mathbb{E}_{\pi'}$ in this equation?

In Sutton-Barto's book on page 63 (81 of the pdf): $$\mathbb{E}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t=s,A_t=\pi'(s)] = \mathbb{E}_{\pi'}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_{t} = s]$$ How does $\mathbb{E}$ suddenly change to…
3
votes
1 answer

What is meant by the expected BLEU cost when training with BLEU and SIMILE?

Recently I was reading a paper based on a new evaluation metric SIMILE. In a section, validation loss comparison had been made for SIMILE and BLEU. The plot showed the expected BLEU cost when training with BLEU and SIMILE. What I'm unable to…
3
votes
1 answer

Shouldn't expected return be calculated for some faraway time in the future $t+n$ instead of current time $t$?

I am learning RL for the first time. It may be naive, but it is a bit odd to grasp this idea that, if the goal of RL is to maximize the expected return, then shouldn't the expected return be calculated for some faraway time in the future ($t+n$)…
SJa
  • 371
  • 2
  • 15
2
votes
2 answers

How is per-decision importance sampling derived in Sutton & Barto's book?

In per-decison importance sampling given in Sutton & Barto's book: Eq 5.12 $\rho_{t:T-1}R_{t+k} =…
2
votes
1 answer

Are these two definitions of the state-action value function equivalent?

I have been reading the Sutton and Barto textbook and going through David Silvers UCL lecture videos on YouTube and have a question on the equivalence of two forms of the state-action value function written in terms of the value function. From…
2
votes
2 answers

Why is there an expectation sign in the Bellman equation?

In chapter 3.5 of Sutton's book, the value function is defined as: Can someone give me some clarification about why there is the expectation sign behind the entire equation? Considering that the agent is following a fixed policy $\pi$, why there…
1
2