3

In most of RL algorithms I saw, there is a coefficient that reduces actions exploration over time, to help convergence.

But in Actor-Critic, or other algorithms (A3C, DDPG, ...) used in continuous action spaces, the different implementation I saw (mainly using Ornstein Uhlenbeck process) is correlated over time, but not decreased.

The action noises are clipped into a range of [-1, 1] and are added to policies that are between [-1, 1] too. So, I don't understand how it could work in environments with hard-to-obtain rewards.

Any thought about this ?

Loheek
  • 266
  • 2
  • 6
  • what is stopping you from multiplying noise value with some decaying coefficient ? – Brale Apr 30 '19 at 16:34
  • I tried on some openAI environments (Pendulum and BipedalWalker) but could not get any convergence, where non-decreased OU noise made perfectly the job. I can barely understand this, so I should try a lot of different decay coefficient to be sure. May take a long time, so I was expecting this question had an already known answer – Loheek Apr 30 '19 at 16:42
  • I'm not sure I don't think it's set in stone it probably depends on environment. You could try experimenting with different setups. Maybe you were decaying exploration too fast, try having it non-decayed for some fixed number of steps and then decayed slowly from that point, or maybe when you see your performance is getting better and you are getting a lot of good episode outcomes try decreasing it from that point – Brale Apr 30 '19 at 16:49

0 Answers0