0

Can we efficiently solve a problem in which:

  • the valid actions at any given time are changing
  • the environment is stochastic
  • we have an infinite time horizon

using MCTS?

To be more specific, I'm attempting to optimize order batching and pick planning in a warehouse by formulating it as an MDP and searching for a good enough plan with MCTS. Within this environment:

  • The current state is fully known (what items need to be picked and where they are, the location of workers)
  • The set of possible actions is dependent on what items need to be picked
  • The state of the warehouse is changing stochastically over time (orders coming in)

MCTS appears to be a good approach to solve this problem because of the large branching search space and our existing ability to simulate a rollout of a policy and the changing state.

This answer points towards an "open loop" formulation of the problem. However, a set of actions (i.e. [<pick #189>, <pick #734>, , <pick #334>...]) is not even necessarily valid, given that between each action there is a chance for the state to change (new orders arriving).

How can I formulate this as MCTS? Is the necessarily going to need "chance nodes"? Can we encode the state as a belief state instead?

nobillygreen
  • 101
  • 1
  • I am not sure I see what the problem is. As you say, the state is known and you can determine the set of allowed actions from that. Each node in the MCTS search tree also corresponds to one unique state. So, you should be able to determine what outward branches (corresponding to actions) are possible from that node and apply this knowledge, yes? – mikkola Oct 21 '22 at 08:05

0 Answers0