0

I was wondering which AI techniques and architectures are used in environments that need predictions to continually improve by the feedback of the user. So let's take some kind of recommendation system, but not for a number of $n$ products, but for some problem of higher space. It's initially trained, but should keep improving by the feedback and corrections applied by the user. The system should continue to improve its outcomes on-the-fly in production, with each interaction.

Obviously, (deep) RL seems to fit this problem, but can you really deploy this learning process to production? Is it really capable of improving results on-the-fly?

Are there any other techniques or architectures that can be used for that?

I'm looking for different approaches in general, in order to be able to compare them and find the right one for problems of that kind. Of course, there always is the option to retrain the whole network, but I was wondering whether there are some online, on-the-fly techniques that can be used to adjust the network?

nbro
  • 39,006
  • 12
  • 98
  • 176
convaldo
  • 121
  • 3
  • 2
    Can you please put your main **specific** question in the title? "continuous learning" is not a question and is really very general. – nbro Jan 04 '21 at 11:11
  • Each type of system will have its own way of coping with this issue. Even within recommender systems there will be different approaches, and prgamatic solutions such as partial online learning - e.g. online updates to each user's profile can be done in real time, but training the aggregate model for all users is likely done offline on some routine (e.g. overnight). – Neil Slater Jan 04 '21 at 13:17
  • 1
    I agree here with nbro, the question needs more focus in order to be answerable. Please use [edit] to make your question clearer. If you have a specific type of recommender system in mind it could help. – Neil Slater Jan 04 '21 at 13:22
  • Thanks for commenting. The point is, that I'd like to compare different approaches on that, so I'm quite free with the technique. It's more about how in general, something like this can be done, like offline retraining, or on-the-fly-adjustments – convaldo Jan 04 '21 at 14:40

0 Answers0