I try to apply RL for a control problem, and I intend to either use Deep Q-Learning or SARSA.
I have two heating storage systems with one heating device, and the RL agent is only allowed to heat up 1 for every time slot. How can I do that?
I have two continuous variables $x(t)$ and $y(t)$, where $x(t)$ quantifies the degree of maximum power for heating up storage 1 and $y(t)$ quantifies the degree of maximum power for heating up storage 2.
Now, IF $x(t) > 0$, THEN $y(t)$ has to be $0$, and vice versa with $x(t)$ and $y(t)$ element $0$ or $[0.25, 1]$. How can I tell this to the agent?
One way would be to adjust the actions after the RL agent has decided about that with a separate control algorithm that overrules the actions of the RL agent. I am wondering if and how this can be also done directly? I'll appreciate every comment.
Update: Of course I could do this with a reward function. But is there not a direct way of doing this? Because this is actually a so called hard constraint. The agent is not allowed to violate this at all as this is technically not feasible. So it will be better to tell the agent directly not to do this (if that is possible).
Reminder: Can anyone tell me more about this issue? I'll highly appreciate any further comment and will be quite thankful for your help. I will also award a bounty for a good answer.