There are many methods and algorithms dealing with planning problems.
If I understand correctly, according to Wikipedia, there are classical planning problems, with:
- a unique known initial state,
- duration-less actions,
- deterministic actions,
- which can be taken only one at a time,
- and a single agent.
Classical planning problems can be solved using classical planning algorithms. The STRIPS framework for problem description and solution, using backward chaining) of the GraphPlan algorithm can be mentioned here.
If actions are non-deterministic, according to Wikipedia, we have a Markow Decision Process (MDP), with:
duration-less actions,
nondeterministic actions with probabilities,
full observability, or partial observability for POMDP
maximization of a reward function,
and a single agent.
MDPs are mostly solved by Reinforcement Learning.
Obviously, classical planning problems can also be formulated as MDPs (with state transition probabilities of 1, i.e. deterministic actions), and there are many examples (e.g. some OpenAI Gyms), where these are successfully solved by RL methods.
Two questions:
Are there some characteristics of a classical planning problem, which makes MDP formulation and Reinforcement Learning a better suiting solution method? Better suiting in the sense that it finds a solution faster or it finds the (near)optimal solution faster.
How do graph search methods like A* perform with classical planning problems? Does STRIPS with backward chaining or GraphPlan always outperform A*? Outperform in the sense of finding the optimal solution faster.