What are the flaws in Jeff Hawkins's AI framework?

Question

In 2004 Jeff Hawkins, inventor of the palm pilot, published a very interesting book called On Intelligence, in which he details a theory how the human neocortex works.

This theory is called Memory-Prediction framework and it has some striking features, for example not only bottom-up (feedforward), but also top-down information processing and the ability to make simultaneous, but discrete predictions of different future scenarios (as described in this paper).

The promise of the Memory-Prediction framework is unsupervised generation of stable high level representations of future possibilities. Something which would revolutionise probably a whole bunch of AI research areas.

Hawkins founded a company and proceeded to implement his ideas. Unfortunately more than ten years later the promise of his ideas is still unfulfilled. So far the implementation is only used for anomaly detection, which is kind of the opposite of what you really want to do. Instead of extracting the understanding, you'll extract the instances which your artificial cortex doesn't understand.

My question is in what way Hawkins's framework falls short. What are the concrete or conceptual problems that so far prevent his theory from working in practice?

score 13 · Accepted Answer · answered Sep 21 '16 at 20:36

The short answer is that Hawkins' vision has yet to be implemented in a widely accessible way, particularly the indispensable parts related to prediction.

The long answer is that I read Hawkins' book a few years ago and was excited by the possibilities of Hierarchical Temporal Memory (HTM). I still am, despite the fact that I have a few reservations about some of his philosophical musings on the meanings of consciousness, free will and other such topics. I won't elaborate on those misgivings here because they're not germane to the main, overwhelming reason why HTM nets haven't succeeded as much as expected to date: to my knowledge, Numenta has only implemented a truncated version of his vision. They left out most of the prediction architecture, which plays such a critical role in Hawkins' theories. As Gerod M. Bonhoff put it in an excellent thesis 1 on HTMs,

"In March of 2007, Numenta released what they claimed was a “research implementation” of HTM theory called Numenta Platform for Intelligent Computing (NuPIC). The algorithm used by NuPIC at this time is called “Zeta1.” NuPIC was released as an open source software platform and binary files of the Zeta1 algorithm. Because of licensing, this paper is not allowed to discuss the proprietary implementation aspects of Numenta’s Zeta1 algorithm. There are, however, generalized concepts of implementation that can be discussed freely. The two most important of these are how the Zeta 1 algorithm (encapsulated in each memory node of the network hierarchy) implements HTM theory. To implement any theory in software, an algorithmic design for each aspect of the theory must be addressed. The most important design decision Numenta adopted was to eliminate feedback within the hierarchy and instead choose to simulate this theoretical concept using only data pooling algorithms for weighting. This decision is immediately suspect and violates key concepts of HTM. Feedback, Hawkins’ insists, is vital to cortical function and central to his theories. Still, Numenta claims that most HTM applicable problems can be solved using their implementation and proprietary pooling algorithms."

I am still learning the ropes in this field and cannot say whether or not Numenta has since scrapped this approach in favor of a full implementation of Hawkins' ideas, especially the all-important prediction architecture. Even if they have, this design decision has probably delayed adoption by many years. That's not a criticism per se; perhaps the computational costs of tracking prediction values and updating them on the fly were too much to bear at the time, on top of the ordinary costs of processing neural nets, leaving them with no other path except to try half-measures like their proprietary pooling mechanisms. Nevertheless, all of the best research papers I've read on the topic since then have chosen to reimplement the algorithms rather than relying on Numenta's platform, typically because of the missing prediction features. Cases in point include Bonhoff's thesis and Maltoni's technical report for the University of Bologna Biometric System Laboratory 2. In all of those cases, however, there is no readily accessible software for putting their variant HTMs to immediate use (as far as I know). The gist of all this is that like G.K. Chesterton's famous maxim about Christianity, "HTMs have not been tried and found wanting; they have been found difficult, and left untried." Since Numenta left out the prediction steps, I assume that they would be the main stumbling blocks awaiting anyone who wants to code Hawkins' full vision of what an HTM should be.

1Bonhoff, Gerod M., 2008, Using Hierarchical Temporal Memory for Detecting Anomalous Network Activity. Presented in March, 2008 at the Air Force Institute of Technology, Wright-Patterson Air Force Base, Ohio.

2Maltoni, Davide, 2011, Pattern Recognition by Hierarchical Temporal Memory. DEIS Technical Report published April 13, 2011. University of Bologna Biometric System Laboratory: Bologna, Italy.

Great answer! I want to add that apparently IBM is giving it a shot now: https://www.technologyreview.com/s/536326/ibm-tests-mobile-computing-pioneers-controversial-brain-algorithms/ — BlindKungFuMaster, Sep 22 '16 at 05:28
@claytond That's a good question, 1 I don't know the answer to. I read up on the latest neural net research every month or so though & haven't seen anything fresh on the topic since this. There's always some fresh news from some unexpected niche of AI that rivets everyone's attention for a few years, so that past ideas are forgotten before they're fully implemented. It's a longstanding pattern. I recently had to explain in CrossValidated why the buzz over fuzzy sets seems to have died way down in the past 10 years - it's basically the same dynamic. Too many promising leads, too little time LOL — SQLServerSteve, Oct 08 '20 at 09:11

score 2 · Answer 2 · answered Aug 21 '18 at 17:34

10 years to production ready?

Let's put that in perspective. The perceptron was introduced in 1957. It did not really even start to flower as a usable model until the release of the PDP books in 1986. For those keeping score: 29 years.

From the PDP books, we did not see that elaborated as usable deep networks until the last decade. If you take the Andrew Ng and Jeff Dean cat recognition task as a deep network defining event that's 2012. Arguably more than 25 years to production ready.

https://en.wikipedia.org/wiki/Timeline_of_machine_learning

That is not an answer to the question. Also, we now already have computers that are fast enough for some very impressive AI achievements. But those achievements don't happen in HTM. — BlindKungFuMaster, Sep 06 '18 at 11:32

What are the flaws in Jeff Hawkins's AI framework?

2 Answers2