Highest Voted 'rewards' Questions - Artificial Intelligence Stack Exchange

14

votes

6 answers

What would motivate a machine?

Currently, within the AI development field, the main focus seems to be on pattern recognition and machine learning. Learning is about adjusting internal variables based on a feedback loop. Maslow's hierarchy of needs is a theory in psychology…

philosophy agi rewards

asked Aug 27 '17 at 15:23

Aleksei Maide

251
2
14

12

votes

3 answers

Why is the reward in reinforcement learning always a scalar?

I'm reading Reinforcement Learning by Sutton & Barto, and in section 3.2 they state that the reward in a Markov decision process is always a scalar real number. At the same time, I've heard about the problem of assigning credit to an action for a…

reinforcement-learning rewards reward-functions multi-objective-rl

asked Aug 06 '20 at 22:06

Sid Mani

223
1
4

10

votes

1 answer

What is the difference between expected return and value function?

I've seen numerous mathematical explanations of reward, value functions $V(s)$, and return functions. The reward provides an immediate return for being in a specific state. The better the reward, the better the state. As I understand it, it can be…

reinforcement-learning comparison rewards value-functions return

asked Mar 17 '18 at 17:00

user3168961

221
2
6

10

votes

2 answers

How do I handle negative rewards in policy gradients with the cross-entropy loss function?

I am using policy gradients in my reinforcement learning algorithm, and occasionally my environment provides a severe penalty (i.e. negative reward) when a wrong move is made. I'm using a neural network with stochastic gradient descent to learn the…

reinforcement-learning policy-gradients rewards cross-entropy stochastic-gradient-descent

asked Nov 29 '16 at 06:10

jstaker7

209
1
2
5

8

votes

2 answers

Why does the "reward to go" trick in policy gradient methods work?

In the policy gradient method, there's a trick to reduce the variance of policy gradient. We use causality, and remove part of the sum over rewards so that only actions happened after the reward are taken into account (See here…

reinforcement-learning math policy-gradients rewards reward-to-go

asked Dec 20 '18 at 01:00

Konstantin Solomatov

288
2
10

7

votes

2 answers

Is there any difference between reward and return in reinforcement learning?

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things. However, in Section 5.6 of the book, 3rd line, first paragraph, it is written: Whereas in Chapter 2 we averaged rewards, in…

reinforcement-learning comparison rewards return

asked Jun 04 '20 at 03:35

SJa

371
2
15

6

votes

2 answers

What is the difference between a loss function and reward/penalty in Deep Reinforcement Learning?

In Deep Reinforcement Learning (DRL) I am having difficulties in understanding the difference between a Loss function, a reward/penalty and the integration of both in DRL. Loss function: Given an output of the model and the ground truth,…

reinforcement-learning comparison deep-rl objective-functions rewards

asked Mar 30 '22 at 11:16

Theo Deep

175
1
5

6

votes

1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…

markov-decision-process rewards reward-shaping interpolation

asked May 21 '21 at 22:32

Kostya

2,416
7
23

6

votes

2 answers

Why does shifting all the rewards have a different impact on the performance of the agent?

I am new to reinforcement learning. For my application, I have found out that if my reward function contains some negative and positive values, my model does not give the optimal solution, but the solution is not bad as it still gives positive…

reinforcement-learning dqn rewards reward-shaping reward-functions

asked Jul 01 '20 at 01:57

Fishfish

61
2

6

votes

1 answer

Why cannot an AI agent adjust the reward function directly?

In standard Reinforcement Learning the reward function is specified by an AI designer and is external to the AI agent. The agent attempts to find a behaviour that collects higher cumulative discounted reward. In Evolutionary Reinforcement Learning…

reinforcement-learning rewards

asked Dec 14 '19 at 08:07

rodan

61
2

6

votes

2 answers

Reinforcement Learning with long term rewards and fixed states and actions

I have read a lot about RL algorithms, that update the action-value function at each step with the currently gained reward. The requirement here is, that the reward is obtained after each step. I have a case, where I have three steps, that have to…

reinforcement-learning rewards

asked Mar 20 '19 at 21:53

Jan

351
3
13

5

votes

1 answer

Non-differentiable reward function to update a neural network

In Reinforcement Learning, when reward function is not differentiable, a policy gradient algorithm is used to update the weights of a network. In the paper Neural Architecture Search with Reinforcement Learning they use accuracy of one neural…

reinforcement-learning policy-gradients rewards

asked Jun 09 '20 at 19:42

samsambakster

71
5

5

votes

1 answer

If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?

I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition) Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…

reinforcement-learning rewards sutton-barto expectation transition-model

asked Jun 05 '20 at 12:58

tmaric

382
2
8

4

votes

1 answer

$E_{\pi}[R_{t+1}|S_t=s,A_t=a] = E[R_{t+1}|S_t=s,A_t=a]$?

I would like to solve the first question of Exercise 3.19 from Sutton and Barto: Exercise 3.19 The value of an action, $q_{\pi}(s, a)$, depends on the expected next reward and the expected sum of the remaining rewards. Again we can think of this in…

reinforcement-learning markov-decision-process rewards sutton-barto expectation

asked Jun 27 '22 at 19:07

user

145
9

4

votes

2 answers

Why is regret so defined in MABs?

Consider a multi-armed bandit(MAB). There are $k$ arms, with reward distributions $R_i$ where $1 \leq i \leq k$. Let $\mu_i$ denote the mean of the $i^{th}$ distribution. If we run the multi-armed bandit experiment for $T$ rounds, the "pseudo…

reinforcement-learning definitions rewards multi-armed-bandits regret

asked Sep 26 '20 at 15:45

stoic-santiago

1,121
5
18

Questions tagged [rewards]