Unable to train Coach for Banana-v0 Gym environment

Question

I have just started playing with Reinforcement learning and starting from the basics I'm trying to figure out how to solve Banana Gym with coach.

Essentially Banana-v0 env represents a Banana shop that buys a banana for \$1 on day 1 and has 3 days to sell it for anywhere between \$0 and \$2, where lower price means a higher chance to sell. Reward is the sell price less the buy price. If it doesn't sell on day 3 the banana is discarded and reward is -1 (banana buy price, no sale proceeds). That's pretty simple.

Ideally the algorithm should learn to set a high price on day 1 and reducing it every day if it didn't sell.

To start I took the coach-bundled CartPole_ClippedPPO.py and CartPole_DQN.py preset files and modified them to run Banana-v0 gym.

The trouble is that I don't see any learning progress regardless what I try, even after running like 50,000 episodes. In comparison the CartPole gym successfully trains in under 500 episodes.

I would have expected some improvement after 50k episodes for such a simple task like Banana.

Is it because the Banana-v0 rewards are not predictable? I.e. whether the banana sells or not is still determined by a random number (with success chance based on the price).

Where should I take it from here? How to identify which Coach agent algorithm I should start with and try to tune it?

Which coach algorithm did you use? If it was one that worked well for CartPole, it may need to be changed to work with a new environment. — Neil Slater, Nov 27 '19 at 08:09
@NeilSlater I tried **ClippedPPO** and **DQN**. If it showed at least some improvement I could take it from there and try to tune it, but as I didn't observe any noticeable improvement I doubt these two are fit for Banana. Maybe some other algorithm would do better? Not sure which one to try. — KeepLearning, Nov 27 '19 at 20:30
It's probably not the choice of algorithm, but hyperparameters that is working against you. If I get time I will take a deeper look and try to answer. — Neil Slater, Nov 28 '19 at 08:04

Unable to train Coach for Banana-v0 Gym environment

0 Answers0