Questions tagged [sparse-rewards]

For questions about the sparsity of the rewards (or reward function), which can slow down learning. Reward shaping can be used to solve this problem.

10 questions
6
votes
1 answer

How to improve the reward signal when the rewards are sparse?

In cases where the reward is delayed, this can negatively impact a models ability to do proper credit assignment. In the case of a sparse reward, are there ways in which this can be negated? In a chess example, there are certain moves that you can…
6
votes
1 answer

What are the pros and cons of sparse and dense rewards in reinforcement learning?

From what I understand, if the rewards are sparse the agent will have to explore more to get rewards and learn the optimal policy, whereas if the rewards are dense in time, the agent is quickly guided towards its learning goal. Are the above…
4
votes
2 answers

How to apply Q-learning when rewards is only available at the last state?

I have a scheduling problem in which there are $n$ slots and $m$ clients. I am trying to solve the problem using Q-learning so I have made the following state-action model. A state $s_t$ is given by the current slot $t=1,2,\ldots,n$ and an action…
4
votes
1 answer

How does the optimization process in hindsight experience replay exactly work?

I was reading the following research paper Hindsight Experience Replay. This is the paper that introduces a concept called Hindsight Experience Replay (HER), which basically attempts to alleviate the infamous sparse reward problem. It is based on…
3
votes
1 answer

Are there any reliable ways of modifying the reward function to make the rewards less sparse?

If I am training an agent to try and navigate a maze as fast as possible, a simple reward would be something like \begin{align} R(\text{terminal}) &= N - \text{time}\ \ , \ \ N \gg \text{everything} \\ R(\text{state})& = 0\ \ \text{if not…
3
votes
1 answer

Can reinforcement learning be used for tasks where only one final reward is received?

Is reinforcement learning problem adaptable to the setting when there is only one - final - reward. I am aware of problems with sparse and delayed rewards, but what about only one reward and a quite long path?
1
vote
1 answer

How do I compute the value function when the reward is only at the end in the context of actor-critic algorithms?

Consider the actor-critic reinforcement learning setting (actor and critic parameterized by a neural network). The reward is given only at the end of the episode (or when there is a timeout there is no reward). How could we learn the value function?…
0
votes
0 answers

Looking for a reinforcement learning algorithm that deals well with a model-based, curiosity-driven approach for chess AI

I am a software engineer that meddled with machine learning (classifiers) during my thesis. After being out of it for a while I decided I want to try and do a neural network project to learn from, specifically reinforcement learning. We'll see how…
0
votes
0 answers

How does Proximal Policy Optimization deal with sparse reward

In the original paper, the objective of PPO is as follows:. My question is, how does this objective behave in a sparse reward setting (i.e., reward is only given after a sequence of actions were taken)? In this case we don't have $\hat{A}_{t}$…
Sam
  • 175
  • 5
0
votes
0 answers

Reinforcement Learning with sparse/delayed reward - should intermediate rewards be decayed over time/training?

I'm thinking of a situation like a game (say, chess) where the real objective/reward is actually determined at the very end. I understand that it's important/helpful to do reward shaping with intermediate rewards, so that the agent can get clues of…