Can the C4.5 algorithm learn a GOAP model?

Question

Goal-oriented action planning (GOAP) is a well-known planning technique in computer games. It was introduced to control the non-player characters in the game F.E.A.R. (2005) by creating an abstract state model. Similar to STRIPS planning, a GOAP model contains an action name, a precondition and an effect. The domain knowledge is stored in these subactions.

The bottleneck of GOAP is, that before the planner can bring the system into the goal state. the action model has to be typed in. Usually, the programmer defines actions like "walk to", "open the door", "take the object", and identifies for each of them the feature set for the precondition and the effect.

In theory, this challenging task can be simplified with a decision tree learning algorithm. A decision tree stores the observed features in a tree and creates the rules on its own with inductive learning. A typical example of the C4.5 algorithm is to find a rule like "if the weather is sunny, then play tennis". Unfortunately, the vanilla tree learning algorithm doesn't separate between different actions.

Is it possible to modify C4.5 algorithm such that the GOAP actions, like "walk to", "open the door", etc., are connected to individual rules?

Paul Brown · Accepted Answer · 2019-07-30T14:56:29.843

You're going to run straight into the qualification problem, the frame problem and ramification problem on these. Plus you'll need to get your head round what an action is and means.

Qualification Problem

A more formal "STRIPS" would be Situation Calculus (logical equivalency demonstrated by Reiter), which can be defined in FOL or SOL, depending on who you talk to. In Situation Calculus you have your actions and you define when they are possible (S is the current Situation):

poss(open_door, S) → holds(door_position(closed) ^ door_lock(unlocked), S).

This is true, when it is possible to open the door it's both closed and unlocked. The difficulty arises when we try to flip that implication around:

poss(open_door, S) ← holds(door_position(closed) ^ door_lock(unlocked), S).

This is not true, because there must also be nothing blocking the door, the door can't be wedged too tightly into the door frame, and a whole bunch of other stuff that we might not think of. This is the qualification problem, which we get around in STRIPS by ignoring it and just flipping the implication.

In your case you'll end up learning a whole bunch of irrelevant qualifications:

 poss(open_door, S) ← holds(door_position(closed) ^ door_lock(unlocked) ^ weather(sunny) ^ grass(green), S).

You'd need some way to trim them down, by observing gameplay for your data you'll only learn the conditions when players choose to do an action, not when it is possible to do that action: the old correlation vs causation problem. You can immediately prune things that aren't fluents, but as far as I'm aware there's no good solution to this without letting the computer simulate to try actions in subsets of the fluents that hold in-order to prune.

Frame Problem

The result of an action in the language of Situation Calculus is that it causes some fluent to hold. So in our example we have a door_position fluent, we can say that opening the door causes door_position(open) to hold, but what about some other action, such as making tea? We know it has no effect on the position of the door, but for the computer to know that we need to tell it:

holds(door_position(open), S) → holds(door_position(open), do(make_tea, S)).

This creates far too many axioms. Reiter has a working solution to this for Golog in his book "Knowledge in Action". In your case it will manifest as missing axioms, i.e. not knowing what the result of an action will be for some fluent. Try beginning with the assumption that an action has no effect unless proven otherwise.

Ramification Problem

The ramification problem is very similar to the frame problem, it means you might not attribute all of the effects of an action to that action. Opening a door can cause changes in air-pressure through a house that can cause another door to slam, such an effect is easy to miss. For this you'd need a sensitive algorithm that can correctly recognise rare events that are important, which is a big ask. The solution is likely to be to ignore it.

Closely related and not usually a problem for such systems, you could also attribute effects to an action incorrectly. How will you be able to distinguish what the action caused vs occurred simultaneously? In some Mario game data it could likely be induced that jumping causes a turtle to move closer to Mario, again correlation vs causation and needing a method to distinguish. The solution here is likely to come from increasing the volume of data.

Definition and Granularity Problem

OK, it's probably possible that with enough data and a sensitive training algorithm you can overcome the above problems. The definition one is merely a conceptual one and easily overcome.

Humans tend to define an action by it's name, say for example "turning on a light". However, in Situation Calculus the action is defined by when it is possible and the result of that action, the name is merely a label. So "turning on a light" where the bulb is broken and when it is not may share the same label, but they are different actions.

Furthermore, in STRIPS and Situation Calculus we're defining "types" or "classes" of actions, the actual execution is an "instance" of that "class". Your problem is to derive the abstract from the concrete.

For you this problem will arise in terms of defining the granularity, as each instance of an action is possible in the unique situation that it occurred in, and caused the unique situation that came after it.

How do you group these actions together to abstract the most similar ones? How do you know how many actions there are to create these groups? If it's not based on a limited set of user-inputs to some computer game, then you might want to draw some ideas from the research into clustering, particularly heuristics of how many clusters to use.

Alternative to a decision tree

If I were to tackle this problem (fund a Post-Doc for me and I'd be tempted), I'd first investigate ILP as a method to learn the relations using Golog as the foundation of my action axioms. Then I'd look at clustering based on a space where each fluent was a dimension. I imagine it would be a rather useful tool, even if it merely created a pretty accurate template for manual review.