I have a certain scheduling problem and I would like to know in general whether I can use Reinforcement learning (and if so what kind of RL) to solve it. Basically my problem is a mixed-integer linear optimization problem. I have a building with an electric heating device that converts electricity into heat. So the action vector (decision variable) is $x(t)$ which quantifies the electrical power of the heating device. The device has to take one decision for every minute of the day (so in total there are $24$ hours $\times 60$ minutes $= 1440$ variables). Each of those variables is a continuous variable and can have any value between $0$ and $2000 W$.
The state space contains several continuous variables:
- External varying electricity price per minute: Between $0$ Cents and $100$ Cents per kWh (amount of energy)
- Internal temperature of the building: Basically between every possible value but there is a constraint to have the temperature between $20 °C$ and $22 °C$
- Heat demand of the building: Any value between $0 W$ and $10.000 W$
- Varying "efficiency" of the electrical heating device between $1$ and $4$ (depending on the external outside temperature)
The goal is to minimize the electricity costs (under a flexible electricity tariff) and to not violate the temperature constraint of the building. As stated before, this problem can be solved by mathematical optimization (mixed-integer linear program). But I would like to know if you can solve this also with reinforcement learning? As I am new to reinforcement learning I would not know how to do this. And I have some concerns about this.
Here I have a very large state space with continuous values. So I can't build a comprehensive $Q-$table as there are to many values. Further, I am not sure whether the problem is a dynamic programming problem (as most/all?) of the reinforcement problems. From an optimization point of view it is a mixed-integer linear problem.
Can anyone tell me if and how I could solve this by using RL? If it is possible I would like to know which type of RL method is suitable for this. Maybe Deep-Q-Learning but also some Monte-Carlo policy iteration or SARSA? Shall I use model-free or model-based RL for this?
Reminder: Does nobody know whether and how I can use reinforcement learning for this problem? I'd highly appreciate every comment.
Can nobody give me some more information on my issue? I'll highly appreciate every comment and would be quite thankful for more insights and your help.