For questions related to reward shaping, which is a technique where supplemental rewards are provided to make a problem easier to learn. In general, there is usually an obvious natural reward for any problem. For games, this is usually a win or loss. For financial problems, the reward is usually profit. Reward shaping augments the natural reward signal by adding additional rewards for making progress toward a good solution.
Questions tagged [reward-shaping]
23 questions
8
votes
2 answers
How do we define the reward function for an environment?
How do you actually decide what reward value to give for each action in a given state for an environment?
Is this purely experimental and down to the programmer of the environment? So, is it a heuristic approach of simply trying different reward…

Hazzaldo
- 279
- 2
- 9
7
votes
1 answer
Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?
If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here.
If that's the case, then why does a negative…

nbro
- 39,006
- 12
- 98
- 176
7
votes
2 answers
What are some best practices when trying to design a reward function?
Generally speaking, is there a best-practice procedure to follow when trying to define a reward function for a reinforcement-learning agent? What common pitfalls are there when defining the reward function, and how should you avoid them? What…

12 rhombi in grid w no corners
- 185
- 1
- 8
6
votes
1 answer
Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?
Say I've got two Markov Decision Processes (MDPs):
$$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$
Both have the same set of states and actions, and the transition…

Kostya
- 2,416
- 7
- 23
6
votes
2 answers
Why does shifting all the rewards have a different impact on the performance of the agent?
I am new to reinforcement learning. For my application, I have found out that if my reward function contains some negative and positive values, my model does not give the optimal solution, but the solution is not bad as it still gives positive…

Fishfish
- 61
- 2
5
votes
1 answer
How define a reward function for a humanoid agent whose goal is to stand up from the ground?
I'm trying to teach a humanoid agent how to stand up after falling. The episode starts with the agent lying on the floor with its back touching the ground, and its goal is to stand up in the shortest amount of time.
But I'm having trouble in regards…

Tirafesi
- 151
- 1
4
votes
1 answer
Is a reward given at every step or only given when the RL agent fails or succeeds?
In reinforcement learning, an agent can receive a positive reward for correct actions and a negative reward for wrong actions, but does the agent also receive rewards for every other step/action?

Dee
- 1,283
- 1
- 11
- 35
4
votes
1 answer
Can recovering a reward function using IRL lead to better policies compared to reward shaping?
I am working on a research project about the different reward functions being used in the RL domain. I have read up on Inverse Reinforcement Learning (IRL) and Reward Shaping (RS). I would like to clarify some doubts that I have with the 2…

calveeen
- 1,251
- 7
- 17
4
votes
1 answer
How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?
I'm working on a continuous state / continuous action controller. It shall control a certain roll angle of an aircraft by issuing the correct aileron commands (in $[-1, 1]$).
To this end, I use a neural network and the DDPG algorithm, which shows…

opt12
- 171
- 4
3
votes
2 answers
What should I do when the potential value of a state is too high?
I'm working on a Reinforcement Learning task where I use reward shaping as proposed in the paper Policy invariance under reward transformations:
Theory and application to reward shaping (1999) by Andrew Y. Ng, Daishi Harada and Stuart Russell.
In…

Marco Favorito
- 185
- 7
3
votes
2 answers
How to deal with changing environment in reinforcement learning
I am new to RL and I'm currently working on implementing a DQN and DDPG agent for a 2D car parking environment. I want to train my agent so that it can successfully traverse the env and park in the designated goal in the middle.
So, my question is:…

ashesofphoenix
- 33
- 4
3
votes
1 answer
How can I fix jerky movement in a continuous action space
I am training an agent to do object avoidance. The agent has control over its steering angle and its speed. The steering angle and speed are normalized in a $[−1,1]$ range, where the sign encodes direction (i.e. a speed of −1 means that it is going…

Shon Verch
- 65
- 4
3
votes
1 answer
How should I design the reward function for racing game (where the goal is to reach finishing line before the opponent)?
I'm building an agent for a racing game. In this game, there is a randomized map where there are speed boosts for the player to pick up and obstacles that act to slow the player down. The goal of the game is to reach the finishing line before the…

Ross Kohler
- 31
- 2
3
votes
3 answers
Is the policy really invariant under affine transformations of the reward function?
In the context of a Markov decision process, this paper says
it is well-known that the optimal policy is invariant to positive affine transformation of the reward function
On the other hand, exercise 3.7 of Sutton and Barto gives an example of a…

IssaRice
- 171
- 3
3
votes
1 answer
Are there any reliable ways of modifying the reward function to make the rewards less sparse?
If I am training an agent to try and navigate a maze as fast as possible, a simple reward would be something like
\begin{align}
R(\text{terminal}) &= N - \text{time}\ \ , \ \ N \gg \text{everything} \\
R(\text{state})& = 0\ \ \text{if not…

Paradox
- 133
- 3