Questions tagged [reward-clipping]

For questions related to the act of clipping values of rewards observed in a problem (typically a Markov Decision Process) to a limited range (often limiting rewards to the range [-1, 1]). This is sometimes done in an effort to stabilize learning processes (most notably in the DQN algorithm and related algorithms in Reinforcement Learning problems such as the Atari games).

3 questions
5
votes
1 answer

Should the reward or the Q value be clipped for reinforcement learning

When extending reinforcement learning to the continuous states, continuous action case, we must use function approximators (linear or non-linear) to approximate the Q-value. It is well known that non-linear function approximators, such as neural…
2
votes
2 answers

What is the main difference between additive rewards and discounted rewards?

What is the difference between additive and discounted rewards?
1
vote
0 answers

Deciding the rewards for different actions in Pong for a DQN agent

I am attempting to implement an agent that learns to play in the Pong environment, the environment was created in PyGame and I return the pixel data and score at each frame. I use a CNN to take a stack of the last 4 frames as input and predicts the…
RMMD12
  • 11
  • 1