For questions related to the act of clipping values of rewards observed in a problem (typically a Markov Decision Process) to a limited range (often limiting rewards to the range [-1, 1]). This is sometimes done in an effort to stabilize learning processes (most notably in the DQN algorithm and related algorithms in Reinforcement Learning problems such as the Atari games).
Questions tagged [reward-clipping]
3 questions
5
votes
1 answer
Should the reward or the Q value be clipped for reinforcement learning
When extending reinforcement learning to the continuous states, continuous action case, we must use function approximators (linear or non-linear) to approximate the Q-value. It is well known that non-linear function approximators, such as neural…

Rui Nian
- 423
- 3
- 13
2
votes
2 answers
What is the main difference between additive rewards and discounted rewards?
What is the difference between additive and discounted rewards?

Marosh Fatima
- 375
- 1
- 3
- 10
1
vote
0 answers
Deciding the rewards for different actions in Pong for a DQN agent
I am attempting to implement an agent that learns to play in the Pong environment, the environment was created in PyGame and I return the pixel data and score at each frame. I use a CNN to take a stack of the last 4 frames as input and predicts the…

RMMD12
- 11
- 1