I have a control problem for a heating device of a building with the goal to minimize the electricity costs for one day under a varying price for electricity in every hour (more details can be seen here:Reinforcement learning applicable to a scheduling problem?). Although the problem is basically a scheduling problem, I want to implement it like a control problem for every time step.
Now, I have 2 questions:
Is it possible to somehow consider future values (e.g. of the electricity price) while during a control action for every time slot? E.g. when the agent knows that in 2 hours the price will fall significantly, then it should tend to consume electricity in 2 hours to get closer to the optimal solution.
Related to 1: Is it possible to get the reward just at the end of the day instead of every hour (although the control actions are every hour)? If you get the reward at every hour, this might lead to a greedy behaviour, which often results in bad results.