Solving the dead time problem for control using reinforcement learning

Question

There are several occasion that reinforcement learning can be used as a control mean. The action is for example the set target temperature (which in many occasions change with time) and the state is for example the current temperature and other variables. The policy is then the control mean that is going to be learnt using the reinforcement learning.

As there is a dead time (input lag) and time delay in the real world, how can one propose to tackle this problem when using reinforcement learning as a control mean? Thank you.

Could you explain why you think a delay between action and observable effects of the action would be a problem? Generally in reinforcement learning it is not an issue, but maybe there is some special case for your example that I am missing. — Neil Slater, Oct 30 '19 at 15:29
Related (maybe a duplicate?) https://ai.stackexchange.com/questions/8267/dealing-with-lags-in-reinforcement-learning — Neil Slater, Oct 30 '19 at 15:31
I think you are right. This is the same problem that I ask here. I didn't find it before, but thank you. — JianNius, Oct 30 '19 at 16:16
Possible duplicate of [Dealing with Lags in Reinforcement Learning](https://ai.stackexchange.com/questions/8267/dealing-with-lags-in-reinforcement-learning) — JianNius, Oct 30 '19 at 16:17
OK. If lag effects are extreme, such that observable state does not change at all, but "momentum" is building due to actions taken on each time step, and the target may overshoot, then you may need to make some changes to the state representation. If you think that applies in your case, then it may not be a duplicate. If so, add that detail to the question . . . otherwise it is good to have duplicates because it is another way to word the same question, so more chance of someone else looking to find the same answer. — Neil Slater, Oct 30 '19 at 16:41
Yes, it is right. The lag effects are so huge that after a certain time steps, it will be noticable but it will not overshoot in my case. — JianNius, Oct 30 '19 at 17:36
You need lag+ "momentum" for this to be a worry. If the subsystem that the agent is controlling is a traditional control with damping already built in, the lag won't matter. If the agent is directly controlling a heating element a few rooms away though, it may matter more. — Neil Slater, Oct 30 '19 at 17:39
According to your terminology maybe you are right to add "momentum". I used the term dead-time in control system because normally a system has this property and when the dead-time is big then it poses a problem while using reinforcement learning to control it. Time delay corresponds to the delay due to the dynamics of the system in my terminology. Thanks — JianNius, Oct 30 '19 at 17:44

Solving the dead time problem for control using reinforcement learning

0 Answers0