Highest Voted 'reward-shaping' Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

How do we define the reward function for an environment?

How do you actually decide what reward value to give for each action in a given state for an environment? Is this purely experimental and down to the programmer of the environment? So, is it a heuristic approach of simply trying different reward…

asked May 12 '19 at 00:15

Hazzaldo

279
2
9

7

votes

1 answer

Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?

If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here. If that's the case, then why does a negative…

reinforcement-learning proofs reward-shaping reward-functions

asked Nov 01 '20 at 23:09

nbro

39,006
12
98
176

7

votes

2 answers

What are some best practices when trying to design a reward function?

Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…

reinforcement-learning reward-design reward-functions reward-shaping inverse-rl

asked Aug 03 '20 at 16:30

12 rhombi in grid w no corners

185
1
8

6

votes

1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…

markov-decision-process rewards reward-shaping interpolation

asked May 21 '21 at 22:32

Kostya

2,416
7
23

6

votes

2 answers

Why does shifting all the rewards have a different impact on the performance of the agent?

I am new to reinforcement learning. For my application, I have found out that if my reward function contains some negative and positive values, my model does not give the optimal solution, but the solution is not bad as it still gives positive…

reinforcement-learning dqn rewards reward-shaping reward-functions

asked Jul 01 '20 at 01:57

Fishfish

61
2

5

votes

1 answer

How define a reward function for a humanoid agent whose goal is to stand up from the ground?

I'm trying to teach a humanoid agent how to stand up after falling. The episode starts with the agent lying on the floor with its back touching the ground, and its goal is to stand up in the shortest amount of time. But I'm having trouble in regards…

reinforcement-learning reward-functions reward-design reward-shaping

asked May 17 '19 at 17:05

Tirafesi

151
1

4

votes

1 answer

Is a reward given at every step or only given when the RL agent fails or succeeds?

In reinforcement learning, an agent can receive a positive reward for correct actions and a negative reward for wrong actions, but does the agent also receive rewards for every other step/action?

reinforcement-learning reward-design reward-functions reward-shaping dense-rewards

asked Jul 21 '20 at 09:59

Dee

1,283
1
11
35

4

votes

1 answer

Can recovering a reward function using IRL lead to better policies compared to reward shaping?

I am working on a research project about the different reward functions being used in the RL domain. I have read up on Inverse Reinforcement Learning (IRL) and Reward Shaping (RS). I would like to clarify some doubts that I have with the 2…

reinforcement-learning deep-rl rewards reward-shaping inverse-rl

asked Feb 13 '20 at 06:04

calveeen

1,251
7
17

4

votes

1 answer

How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

I'm working on a continuous state / continuous action controller. It shall control a certain roll angle of an aircraft by issuing the correct aileron commands (in $[-1, 1]$). To this end, I use a neural network and the DDPG algorithm, which shows…

reinforcement-learning rewards ddpg reward-shaping reward-design

asked Jan 31 '20 at 10:45

opt12

171
4

3

votes

2 answers

What should I do when the potential value of a state is too high?

I'm working on a Reinforcement Learning task where I use reward shaping as proposed in the paper Policy invariance under reward transformations: Theory and application to reward shaping (1999) by Andrew Y. Ng, Daishi Harada and Stuart Russell. In…

reinforcement-learning papers reward-design reward-shaping potential-reward-shaping

asked May 08 '18 at 22:23

Marco Favorito

185
7

3

votes

2 answers

How to deal with changing environment in reinforcement learning

I am new to RL and I'm currently working on implementing a DQN and DDPG agent for a 2D car parking environment. I want to train my agent so that it can successfully traverse the env and park in the designated goal in the middle. So, my question is:…

reinforcement-learning training dqn environment reward-shaping

asked Mar 04 '22 at 18:07

ashesofphoenix

33
4

3

votes

1 answer

How can I fix jerky movement in a continuous action space

I am training an agent to do object avoidance. The agent has control over its steering angle and its speed. The steering angle and speed are normalized in a $[−1,1]$ range, where the sign encodes direction (i.e. a speed of −1 means that it is going…

reinforcement-learning deep-learning deep-rl rewards reward-shaping

asked Aug 29 '20 at 14:09

Shon Verch

65
4

3

votes

1 answer

How should I design the reward function for racing game (where the goal is to reach finishing line before the opponent)?

I'm building an agent for a racing game. In this game, there is a randomized map where there are speed boosts for the player to pick up and obstacles that act to slow the player down. The goal of the game is to reach the finishing line before the…

reinforcement-learning game-ai rewards reward-design reward-shaping

asked May 01 '20 at 16:10

Ross Kohler

31
2

3

votes

3 answers

Is the policy really invariant under affine transformations of the reward function?

In the context of a Markov decision process, this paper says it is well-known that the optimal policy is invariant to positive affine transformation of the reward function On the other hand, exercise 3.7 of Sutton and Barto gives an example of a…

reinforcement-learning markov-decision-process policies reward-functions reward-shaping

asked Mar 03 '20 at 00:18

IssaRice

171
3

3

votes

1 answer

Are there any reliable ways of modifying the reward function to make the rewards less sparse?

If I am training an agent to try and navigate a maze as fast as possible, a simple reward would be something like \begin{align} R(\text{terminal}) &= N - \text{time}\ \ , \ \ N \gg \text{everything} \\ R(\text{state})& = 0\ \ \text{if not…

reinforcement-learning rewards reward-shaping reward-design sparse-rewards

asked Jul 18 '19 at 04:12

Paradox

133
3

Questions tagged [reward-shaping]