5

I have just started to study reinforcement learning and, as far as I understand, existing algorithms search for the optimal solution/policy, but do not allow the possibility for the programmer to suggest a way to find the solution (to guide their learning process). This would be beneficial for finding the optimal solution faster.

Is it possible to guide the learning process in (deep) reinforcement learning?

nbro
  • 39,006
  • 12
  • 98
  • 176
Cristian M
  • 249
  • 2
  • 6

2 Answers2

3

The programmer already guides the RL algorithm (or agent) by specifying the reward function. However, the reward function alone may not be sufficient to learn efficiently and fast, as you correctly noticed.

To attempt to solve this inefficiency problem, one solution is to combine reinforcement learning with supervised learning. For example, the paper Deep Q-learning from Demonstrations (2017) by Todd Hester et al. describes an approach to achieve this.

The paper Active Reinforcement Learning (2008) by Arkady Epshteyn et al. also tries to solve this problem but by incorporating approximations (given by domain experts) of the MDP.

There are probably many other possible solutions. In fact, all model-based RL algorithms could probably fall into this category of algorithms that estimate or incorporate the dynamics of the environment to find a policy more efficiently.

nbro
  • 39,006
  • 12
  • 98
  • 176
2

Here are two very related interesting papers:

  1. Learning from Human Preferences
  2. Improving Reinforcement Learning with Human Input
Ray Walker
  • 451
  • 3
  • 8