0

I have a control problem for a heating device of a building with the goal to minimize the electricity costs for one day under a varying price for electricity in every hour (more details can be seen here:Reinforcement learning applicable to a scheduling problem?). Although the problem is basically a scheduling problem, I want to implement it like a control problem for every time step.

Now, I have 2 questions:

  1. Is it possible to somehow consider future values (e.g. of the electricity price) while during a control action for every time slot? E.g. when the agent knows that in 2 hours the price will fall significantly, then it should tend to consume electricity in 2 hours to get closer to the optimal solution.

  2. Related to 1: Is it possible to get the reward just at the end of the day instead of every hour (although the control actions are every hour)? If you get the reward at every hour, this might lead to a greedy behaviour, which often results in bad results.

nbro
  • 39,006
  • 12
  • 98
  • 176
PeterBe
  • 212
  • 1
  • 11

1 Answers1

2

Typically in a control problem, it is OK to include data about a future event, if it can be reliably predicted at the time that a decision is required.

This would include known rules such as a pricing schedule. You could even use some feature engineering to help the agent by making a state feature that counted down to changes, or presented it as a relative change to current price. E.g. it would be fine to have a couple of features that pre-calculated "in 10 minutes, prices should change by -0.003 per kWH" - whether or not that helps your agent in practice I could not say.

What you should not include is any resolution of random variables in advance. It would be OK to predict these using a model, or give the agent details about the distribution, but it is not OK to work with all data from all time steps already resolved, when the agent will in reality be required to make a decision without that data. An example of this kind of data is the measured internal or external temperature on future time steps. (Predicted temperature from a weather forecast would be OK though)

This constraint, of not showing the controller the whole future when it is making a decision, should also be applied for other controller optimisations as well as reinforcement learning - no real world controller has perfect knowledge of how stochastic results will turn out, or the ability to rewind time and make the correct decision in the past based on observations that happened after it made that decision. However, to repeat the point made at the start, it is OK to make those decisions based on predictions of what will happen, where it is reasonable to have the knowledge in advance, and those predictions could be very accurate in some environments. A pricing schedule that is agreed in advance with your power provider should be one of those types of knowledge that it is fine to use.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/130725/discussion-on-answer-by-neil-slater-can-future-information-be-included-in-a-cont). – nbro Oct 21 '21 at 13:23