I'm trying to teach a humanoid agent how to stand up after falling. The episode starts with the agent lying on the floor with its back touching the ground, and its goal is to stand up in the shortest amount of time.
But I'm having trouble in regards to reward shaping. I've tried multiple different reward functions, but they all end up the same way: the agent quickly learns to sit (i.e. lifting its torso), but then gets stuck on this local optimum forever.
Any ideas or advice on how to best design a good reward function for this scenario?
A few reward functions I've tried so far:
- current_height / goal_height
- current_height / goal_height - 1
- current_height / goal_height - reward_prev_timestep
- (current_height / goal_height)^N (tried multiple different values of N)
- ...