Highest Voted 'expectation' Questions - Artificial Intelligence Stack Exchange

7

votes

2 answers

Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?

I often see that the state-action value function is expressed as: $$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$ Why does expressing the…

asked Jun 06 '20 at 08:55

Daniel Wiczew

323
2
10

5

votes

1 answer

What does the argmax of the expectation of the log likelihood mean?

What does the following equation mean? What does each part of the formula represent or mean? $$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$

machine-learning math probability notation expectation

asked Jan 28 '18 at 11:15

arash moradi

181
1
6

5

votes

1 answer

Why is the mean used to compute the expectation in the GAN loss?

From Goodfellow et al. (2014), we have the adversarial loss: $$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$ In practice, the expectation is…

deep-learning objective-functions generative-adversarial-networks expectation

asked Aug 21 '20 at 05:01

A is for Ambition

153
4

5

votes

1 answer

If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?

I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition) Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…

reinforcement-learning rewards sutton-barto expectation transition-model

asked Jun 05 '20 at 12:58

tmaric

382
2
8

4

votes

1 answer

$E_{\pi}[R_{t+1}|S_t=s,A_t=a] = E[R_{t+1}|S_t=s,A_t=a]$?

I would like to solve the first question of Exercise 3.19 from Sutton and Barto: Exercise 3.19 The value of an action, $q_{\pi}(s, a)$, depends on the expected next reward and the expected sum of the remaining rewards. Again we can think of this in…

reinforcement-learning markov-decision-process rewards sutton-barto expectation

asked Jun 27 '22 at 19:07

user

145
9

4

votes

1 answer

How is the state-value function expressed as a product of sums?

The state-value function for a given policy $\pi$ is given by $$\begin{align} V^{\pi}(s) &=E_{\pi}\left\{r_{t+1}+\gamma r_{t+2}+\gamma^{2} r_{t+3}+\cdots \mid s_{t}=s\right\} \\ &=E_{\pi}\left\{r_{t+1}+\gamma V^{\pi}\left(s_{t+1}\right) \mid…

reinforcement-learning math value-functions books expectation

asked May 10 '21 at 03:17

hanugm

3,571
3
18
50

4

votes

2 answers

Why is $G_{t+1}$ is replaced with $v_*(S_{t+1})$ in the Bellman optimality equation?

In equation 3.17 of Sutton and Barto's book: $$q_*(s, a)=\mathbb{E}[R_{t+1} + \gamma v_*(S_{t+1}) \mid S_t = s, A_t = a]$$ $G_{t+1}$ here have been replaced with $v_*(S_{t+1})$, but no reason has been provided for why this step has been taken. Can…

reinforcement-learning value-functions expectation return bellman-equations

asked Jun 04 '20 at 19:27

ZERO NULLS

147
8

4

votes

2 answers

What is the difference between return and expected return?

At a time step $t$, for a state $S_{t}$, the return is defined as the discounted cumulative reward from that time step $t$. If an agent is following a policy (which in itself is a probability distribution of choosing a next state $S_{t+1}$ from…

reinforcement-learning comparison q-learning return expectation

asked Jun 30 '19 at 15:12

digi philos

41
1

3

votes

1 answer

What is wrong with equation 7.3 in Sutton & Barto's book?

Equation 7.3 of Sutton Barto book: $$\text{Equation: } max_s|\mathbb{E}_\pi[G_{t:t+n}|S_t = s] - v_\pi| \le \gamma^nmax_s|V_{t+n-1}(s) - v_\pi(s)| $$ $$\text{where }G_{t:t+n} = R_{t+1} + \gamma R_{t+2} + .....+\gamma^{n-1} R_{t+n} +…

reinforcement-learning value-functions sutton-barto expectation return

asked Sep 15 '20 at 11:06

ZERO NULLS

147
8

3

votes

1 answer

How does $\mathbb{E}$ suddenly change to $\mathbb{E}_{\pi'}$ in this equation?

In Sutton-Barto's book on page 63 (81 of the pdf): $$\mathbb{E}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t=s,A_t=\pi'(s)] = \mathbb{E}_{\pi'}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_{t} = s]$$ How does $\mathbb{E}$ suddenly change to…

reinforcement-learning probability probability-distribution expectation statistics

asked Jun 07 '20 at 02:17

ZERO NULLS

147
8

3

votes

1 answer

What is meant by the expected BLEU cost when training with BLEU and SIMILE?

Recently I was reading a paper based on a new evaluation metric SIMILE. In a section, validation loss comparison had been made for SIMILE and BLEU. The plot showed the expected BLEU cost when training with BLEU and SIMILE. What I'm unable to…

natural-language-processing training metric expectation

asked May 03 '20 at 08:15

develop97

31
2

3

votes

1 answer

Shouldn't expected return be calculated for some faraway time in the future $t+n$ instead of current time $t$?

I am learning RL for the first time. It may be naive, but it is a bit odd to grasp this idea that, if the goal of RL is to maximize the expected return, then shouldn't the expected return be calculated for some faraway time in the future ($t+n$)…

reinforcement-learning rewards expectation return

asked May 03 '20 at 06:11

SJa

371
2
15

2

votes

2 answers

How is per-decision importance sampling derived in Sutton & Barto's book?

In per-decison importance sampling given in Sutton & Barto's book: Eq 5.12 $\rho_{t:T-1}R_{t+k} =…

reinforcement-learning off-policy-methods expectation importance-sampling conditional-probability

asked Jun 13 '20 at 11:03

ZERO NULLS

147
8

2

votes

1 answer

Are these two definitions of the state-action value function equivalent?

I have been reading the Sutton and Barto textbook and going through David Silvers UCL lecture videos on YouTube and have a question on the equivalence of two forms of the state-action value function written in terms of the value function. From…

reinforcement-learning comparison value-functions expectation bellman-equations

asked May 07 '20 at 09:58

David

4,591
1
6
25

2

votes

2 answers

Why is there an expectation sign in the Bellman equation?

In chapter 3.5 of Sutton's book, the value function is defined as: Can someone give me some clarification about why there is the expectation sign behind the entire equation? Considering that the agent is following a fixed policy $\pi$, why there…

reinforcement-learning value-functions expectation bellman-equations

asked Apr 03 '20 at 18:43

Jack

23
3

Questions tagged [expectation]