Why should we study causation in artificial intelligence?

Question

Judea Pearl won the 2011 Turing Award

For fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning.

He is credited with the invention of Bayesian networks and a framework for causal inference.

Why should we study causation and causal inference in artificial intelligence? How would causation integrate into other topics like machine learning? There are facts or relations that cannot be retrieved from data (e.g. cause-effect relations), which is the driving force of ML. Are there other reasons for studying causation?

John Doucette · Accepted Answer · 2019-06-11T20:44:59.843

There are two intermixed elements in your question:

Why have people already studied causality in AI?
Why would people like to continue studying causality in AI?

tl;dr: AI systems can't function in the real world without some way of understanding uncertainty. The best ways we have to understand uncertainty don't work well unless we also understand causal structure. Getting most of the way towards this understanding was a huge accomplishment for AI research, but there's still a big problem waiting to be solved here.

For part 1, let's think about where research into causality comes from. Pearl originally set out to solve a major problem plaguing AI researchers in the 1980s: reasoning under uncertainty. This area is so central to AI systems that it has its own, large, conference: UAI. The need for reasoning under uncertainty arose because the real world is filled with uncertainty. Consider even a simple problem, like have a robot navigate an empty room. Even if the robot knows its initial position exactly, it is not likely to know its true position for long. The robots' wheels might have a certain specification, but wheels stick and slip (friction), and they do so unpredictably (e.g. maybe one part of the floor didn't get polished as much as the others). The robot might have sensors, but sensors are imperfect. Does a light sensor value of 0.25 mean a wall is near, or just that the sun is coming through the window at an unfortunate angle? Or maybe just that the shade of paint is slightly different there? Does our acoustic sensor reading of 1.35 mean that we're 1.35 meters from the wall, or did the signal get reflected, and return to us by a different path?

One of Pearl's major contributions was showing how we can use the rules of probability to reason about these events correctly. Although others had done the basics of this long ago, Pearl proposed the idea of Bayesian Networks. Thrun and many others used these techniques to solve problems like the robot navigation task discussed above.

A problem with Bayesian Networks is that, if built incorrectly, they lose much of their efficiency benefits. The correct construction is usually the one that best captures known causal relations between factors. Further, algorithms for inference in a Bayesian Network can not easily answer questions about counterfactuals. They cannot tell our robot what would happen in a world where certain actions were taken. They can only say whether certain behaviors are likely co-occur.

These outstanding issues led Pearl to work on causality and causal networks. It is important for our systems to be able to answer counterfactual questions, because often the answers to those questions determine how the system aught to act. Pearl's do-calculus (nicely summarized here, and in Pearl's The Book of Why) has solved this problem, and shown how causes can often be inferred from observation alone. It's super exciting!

Hopefully you're now convinced that Causality was worth studying within AI. You might now wonder why it's still worth studying.

The main open problem right now is that to do causal reasoning, we need to already have a network that describes causal relations. These networks are hard to construct by hand, so we'd like to be able to learn the structures of these networks. Unfortunately, this looks like a very hard problem, both to solve exactly (it's #P-hard if I recall correctly), and to approximate. There are no known general algorithms, at least as of 2018. Getting this would be a major breakthrough in AI research, and more generally, in statistics, philosophy of science, and epistemology. Even getting something that mostly worked, most of the time, would be huge. Peter Van Beek has been doing some work on this recently, which could be a good starting place if you want to read more.

Why should we study causation in artificial intelligence?

1 Answers1