0

Consider a sequential decision-making problem over $T$ periods where the parameters of the problem should be learned and also optimize an objective function. One possibility is to model the problem as a dynamic program and use RL techniques to solve it. However, we can solve the problem step-by-step where at each step (period), we need to solve an optimization problem like LP, MIP. In this case, we may use TS or UCB algorithms for each period. I am wondering if there are references to use this idea. Is there any general approach to solve this type of optimization problems?

Amin
  • 471
  • 2
  • 11

0 Answers0