Highest Voted 'reward-functions' Questions - Artificial Intelligence Stack Exchange

12

votes

3 answers

Why is the reward in reinforcement learning always a scalar?

I'm reading Reinforcement Learning by Sutton & Barto, and in section 3.2 they state that the reward in a Markov decision process is always a scalar real number. At the same time, I've heard about the problem of assigning credit to an action for a…

asked Aug 06 '20 at 22:06

Sid Mani

223
1
4

12

votes

4 answers

Counterexamples to the reward hypothesis

On Sutton and Barto's RL book, the reward hypothesis is stated as that all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal (called…

reinforcement-learning sutton-barto reward-design reward-functions reward-hypothesis

asked Nov 19 '19 at 02:30

Bananin

221
1
5

8

votes

1 answer

What are other ways of handling invalid actions in scenarios where all rewards are either 0 (best reward) or negative?

I created an OpenAI Gym environment, and I would like to check the performance of the agent from OpenAI Baselines DQN approach on it. In my environment, the best possible outcome for the agent is 0 - the robot needs zero non-necessary resources to…

reinforcement-learning q-learning dqn implementation reward-functions

asked May 29 '17 at 09:02

AlexGuevara

263
1
8

8

votes

2 answers

How do we define the reward function for an environment?

How do you actually decide what reward value to give for each action in a given state for an environment? Is this purely experimental and down to the programmer of the environment? So, is it a heuristic approach of simply trying different reward…

reinforcement-learning reward-functions reward-design reward-shaping

asked May 12 '19 at 00:15

Hazzaldo

279
2
9

8

votes

1 answer

Suitable reward function for trading buy and sell orders

I am working to build a deep reinforcement learning agent which can place orders (i.e. limit buy and limit sell orders). The actions are {"Buy": 0 , "Do Nothing": 1, "Sell": 2}. Suppose that all the features are well suited for this task. I wanted…

reinforcement-learning deep-rl reward-functions reward-design algorithmic-trading

asked Jan 20 '19 at 00:44

fgauth

189
1
4

7

votes

1 answer

Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?

If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here. If that's the case, then why does a negative…

reinforcement-learning proofs reward-shaping reward-functions

asked Nov 01 '20 at 23:09

nbro

39,006
12
98
176

7

votes

2 answers

What are some best practices when trying to design a reward function?

Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…

reinforcement-learning reward-design reward-functions reward-shaping inverse-rl

asked Aug 03 '20 at 16:30

12 rhombi in grid w no corners

185
1
8

6

votes

1 answer

How to improve the reward signal when the rewards are sparse?

In cases where the reward is delayed, this can negatively impact a models ability to do proper credit assignment. In the case of a sparse reward, are there ways in which this can be negated? In a chess example, there are certain moves that you can…

reinforcement-learning reward-functions sparse-rewards delayed-rewards potential-reward-shaping

asked Feb 03 '21 at 18:17

tryingtolearn

385
1
2
10

6

votes

1 answer

What are the pros and cons of sparse and dense rewards in reinforcement learning?

From what I understand, if the rewards are sparse the agent will have to explore more to get rewards and learn the optimal policy, whereas if the rewards are dense in time, the agent is quickly guided towards its learning goal. Are the above…

reinforcement-learning comparison reward-functions sparse-rewards dense-rewards

asked Aug 13 '20 at 07:05

stoic-santiago

1,121
5
18

6

votes

2 answers

Why does shifting all the rewards have a different impact on the performance of the agent?

I am new to reinforcement learning. For my application, I have found out that if my reward function contains some negative and positive values, my model does not give the optimal solution, but the solution is not bad as it still gives positive…

reinforcement-learning dqn rewards reward-shaping reward-functions

asked Jul 01 '20 at 01:57

Fishfish

61
2

6

votes

1 answer

How should I handle invalid actions in a grid world?

I'm building a really simple experiment, where I let an agent move from the bottom-left corner to the upper-right corner of a $3 \times 3$ grid world. I plan to use DQN to do this. I'm having trouble handling the starting point: what if the Q…

reinforcement-learning q-learning dqn reward-design reward-functions

asked May 19 '20 at 03:39

o_yeah

197
7

6

votes

2 answers

How are the reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ equivalent?

In this video, the lecturer states that $R(s)$, $R(s, a)$ and $R(s, a, s')$ are equivalent representations of the reward function. Intuitively, this is the case, according to the same lecturer, because $s$ can be made to represent the state and the…

reinforcement-learning markov-decision-process proofs notation reward-functions

asked Feb 07 '19 at 15:38

nbro

39,006
12
98
176

5

votes

1 answer

How do I convert an MDP with the reward function in the form $R(s,a,s')$ to and an MDP with a reward function in the form $R(s,a)$?

The AIMA book has an exercise about showing that an MDP with rewards of the form $r(s, a, s')$ can be converted to an MDP with rewards $r(s, a)$, and to an MDP with rewards $r(s)$ with equivalent optimal policies. In the case of converting to $r(s)$…

reinforcement-learning markov-decision-process proofs reward-functions

asked May 25 '20 at 11:19

Asher

436
3
8

5

votes

1 answer

How define a reward function for a humanoid agent whose goal is to stand up from the ground?

I'm trying to teach a humanoid agent how to stand up after falling. The episode starts with the agent lying on the floor with its back touching the ground, and its goal is to stand up in the shortest amount of time. But I'm having trouble in regards…

reinforcement-learning reward-functions reward-design reward-shaping

asked May 17 '19 at 17:05

Tirafesi

151
1

4

votes

1 answer

How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?

I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…

reinforcement-learning value-functions convergence reward-functions double-dqn

asked Oct 24 '18 at 18:29

apitsch

93
9

Questions tagged [reward-functions]