There are several occasion that reinforcement learning can be used as a control mean. The action is for example the set target temperature (which in many occasions change with time) and the state is for example the current temperature and other variables. The policy is then the control mean that is going to be learnt using the reinforcement learning.
As there is a dead time (input lag) and time delay in the real world, how can one propose to tackle this problem when using reinforcement learning as a control mean? Thank you.