I was wondering which AI techniques and architectures are used in environments that need predictions to continually improve by the feedback of the user. So let's take some kind of recommendation system, but not for a number of $n$ products, but for some problem of higher space. It's initially trained, but should keep improving by the feedback and corrections applied by the user. The system should continue to improve its outcomes on-the-fly in production, with each interaction.
Obviously, (deep) RL seems to fit this problem, but can you really deploy this learning process to production? Is it really capable of improving results on-the-fly?
Are there any other techniques or architectures that can be used for that?
I'm looking for different approaches in general, in order to be able to compare them and find the right one for problems of that kind. Of course, there always is the option to retrain the whole network, but I was wondering whether there are some online, on-the-fly techniques that can be used to adjust the network?