Where to start with reinforced learning on actions and rewards sampled from slow ongoing real life system

Question

I would like some pointers, possible projects that solve conceptually similar goals, code examples or tutorials.

I am trying to achieve a system that is able to start or stop ventilation of a given space based different outside and inside metrics such as humidity, temperature, time, etc. to achieve a decrease of relative humidity.

Actuating the system based on simple physics gave me questionable results, this is because I am not able to model the whole dynamic of the system.

I was thinking if reinforced learning could help me learn a good policy. The system should learn on real life data, actuating real ventilation, with the obvious slowness of such system.

I am quite new with AI, able to comprehend and create a simple OpenAi Gym. I am not even sure if and how something like this is achievable with so limited data flow. I am currently recording and analyze all possible data I can measure, together with some more or less random ventilation sessions. I am sure there are better ways to do this.

RL is very bad at learning in real time, it basically only works through simulators and requires a massive amount of data unfortunately. Generalization is a fundamental problem. — FourierFlux, Aug 17 '21 at 22:25

score 1 · Answer 1 · answered Aug 17 '21 at 20:16

First you'd need to mathematically model your real environment. Probably use some differential equations.

Once you have a good model, you still won't have your real case parameters. So I can see 2 different approaches:

Theoretically + Experimentally: Empirically measure real data to try to find those parameters. (Make a simple PID controller)
Make a robust and general policy that adapts to any parameter.

The first alternative is very straightforward, you basically just have 3 parameters to adjust and there are clear methods to adjust it depending on your data behavior.

Once you are in AI.exchange, I assume you're choosing the 2nd way. I consider it's way harder to implement, but also more fun. So you'll need create a highly parametrized environment, make a reinforcement learning model, train it on all different kinds of parameters (just make sure your real parameter are somewhere inside that range).

If you do it all right, you should have a robust general policy that can behave well in any general condition.

Where to start with reinforced learning on actions and rewards sampled from slow ongoing real life system

1 Answers1