Consider a sequential decision-making problem over $T$ periods where the parameters of the problem should be learned and also optimize an objective function. One possibility is to model the problem as a dynamic program and use RL techniques to solve it. However, we can solve the problem step-by-step where at each step (period), we need to solve an optimization problem like LP, MIP. In this case, we may use TS or UCB algorithms for each period. I am wondering if there are references to use this idea. Is there any general approach to solve this type of optimization problems?
Asked
Active
Viewed 26 times