For questions about the Hindsight Experience Replay (HER), proposed in the paper "Hindsight Experience Replay" (2017) by Marcin Andrychowicz et al.
Questions tagged [hindsight-experience-replay]
11 questions
4
votes
1 answer
How does the optimization process in hindsight experience replay exactly work?
I was reading the following research paper Hindsight Experience Replay. This is the paper that introduces a concept called Hindsight Experience Replay (HER), which basically attempts to alleviate the infamous sparse reward problem. It is based on…

vikram71198
- 91
- 3
3
votes
1 answer
How does Hindsight Experience Replay learn from unsuccessful trajectories?
I am confused by how HER learns from unsuccessful trajectories. I understand that from failed trajectories it creates 'fake' goals that it can learn from.
Ignoring HER for now, if in the case where the robotic arm reaches the goal correctly, then…

piccolo
- 173
- 5
3
votes
0 answers
Is this a good approach to solving Atari's "Montezuma's Revenge"?
I'm new to Reinforcement Learning. For an internship, I am currently training Atari's "Montezuma's Revenge" using a double Deep Q-Network with Hindsight Experience Replay (HER) (see also this article).
HER is supposed to alleviate the reward…

vikram71198
- 91
- 3
2
votes
0 answers
How can PPO be combined with HER?
I ask because PPO is apparently an on-policy algorithm & the HER paper says that it can be combine with any off-policy algorithm. Yet I see GitHub projects that have combined them somehow?
How is this done? And is it reasonable?

profPlum
- 360
- 1
- 9
2
votes
1 answer
What is the difference between success rate and reward when dealing with binary and sparse rewards?
In OpenAI Gym "reward" is defined as:
reward (float): amount of reward achieved by the previous action. The
scale varies between environments, but the goal is always to increase
your total reward.
I am training Hindsight Experience Replay on Fetch…

rrz0
- 263
- 2
- 7
2
votes
0 answers
How does Hindsight Experience Replay cope with multiple goals?
What if there are multiple goals? For example, let's consider the bit-flipping environment as described in the paper HER with one small change: now, the goal is not some specific configuration, but let's say for the last $m$ bits (e.g. $m=2$), I do…

Savco
- 61
- 1
1
vote
2 answers
Why does HER not work with on-policy RL algorithms?
I'm wondering because I don't appreciate what is wrong with just applying HER to an otherwise on-policy algorithm? Like if we do that will the training stability just fall apart? And if so why? My understanding is that on-policy is just a category…

profPlum
- 360
- 1
- 9
1
vote
1 answer
What does $r : \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$ mean in the article Hindsight Experience Replay, section 2.1?
Taken from section 2.1 in the article:
We consider the standard reinforcement learning formalism consisting of an agent interacting with an environment. To simplify the exposition we assume that the environment is fully observable. An environment…

WinnieThePooh
- 13
- 3
1
vote
0 answers
Why would DDPG with Hindsight Experience Replay not converge?
I am trying to train a DDPG agent augmented with Hindsight Experience Replay (HER) to solve the KukaGymEnv environment. The actor and critic are simple neural networks with two hidden layers (as in the HER paper).
More precisely, the…

Vedant Shah
- 105
- 1
- 7
1
vote
1 answer
What do the state features of KukaGymEnv represent?
I trying to use DDPG augmented with Hindsight Experience Replay (HER) on pybullet's KukaGymEnv.
To formulate the feature vector for the goal state, I need to know what the features of the state of the environment represent. To be precise, a typical…

Vedant Shah
- 105
- 1
- 7
0
votes
0 answers
How to establish baselines, with different training loops
My objective is to test out a new algorithm that I designed. However, I am confused whether my methodology to train the networks is correct.
I am just concerned about the training loops:
In the first algorithm (DIAYN, SAC Based Algorithm), the…

Yash_Bit
- 1
- 1