5

In a video lecture on the development of neural networks and the history of deep learning (you can start from minute 13), the lecturer (Yann LeCunn) said that the development of neural networks stopped until the 80s because people were using the wrong neurons (which were binary so discontinuous) and that is due to the slowness of multiplying floating point numbers which made the use of backpropagation really difficult.

He said, I quote, "If you have continuous neurons, you need to multiply the activation of a neuron by a weight to get a contribution to the weighted sum."

But the statement stays true even with binary (or any discontinuous activation function) neurons. Am I wrong? (at least, as long as you're in the hidden layer, the output of your neuron will be multiplied by a weight I guess). The same professor said that the perceptron, ADALINE relied on weighted sums so they were computing multiplications anyways.

I don't know what I miss here and I hope someone will enlighten me.

nbro
  • 39,006
  • 12
  • 98
  • 176
Daviiid
  • 563
  • 3
  • 15
  • 2
    Not answering your question directly, but I think a large contribution to the halted development was the lack of computing power. We simply couldn't easily test proposed methods because computers were too slow at the time – Recessive Feb 25 '21 at 03:03
  • Multiplying real numbers by 1 and 0 is a lot different than multiplying it with another real number. –  Feb 25 '21 at 03:06
  • @Recessive thank you for the answer. I'll take it generally that there was a lack of computer power and not exactly the lack of computing power for multiplying floating numbers. – Daviiid Feb 25 '21 at 03:17
  • @DuttaA I agree, but throughout the video it's not said that the weights are 0 or 1. Sure the activation function does outputs zero or one, but the weights can be any real number. I mean the professor said that the perceptron was equipped with potentiometers to vary the weights of the connexions so the weights were real numbers. – Daviiid Feb 25 '21 at 03:20
  • 1
    0.33*1=easy, 0.33*0.79=tough. –  Feb 25 '21 at 03:54
  • 1
    @Daviiid Not necessarily, the GPU was only widely available in early 2000, which is what is primarily used now for fast floating point multiplication, so in the 50's-80's there was an even greater disadvantage to this than other problems. – Recessive Feb 25 '21 at 04:33
  • @DuttaA oh yeah I didn't think about it, thank you ! – Daviiid Feb 25 '21 at 17:00
  • @Recessive I see, thank you for your answer. – Daviiid Feb 25 '21 at 17:01
  • https://ai.stackexchange.com/questions/13233/why-did-ml-only-become-viable-after-nvidias-chips-were-available/13238#13238 I answered something similar. It might give you a broad overview. Regarding your question it is easy to answer if you get hold of some old prof. In the field of semiconductor technology. –  Feb 26 '21 at 16:47

1 Answers1

1

I will first address your main question "Why did the development of neural networks stop between 50s and 80s?" In 40-50s there was a lot of progress (McCulloch and Pitts); the perceptron was invented (Rosenblatt). That gave rise to an AI hype giving many promises (exactly like today)!

However, Minsky and Papert have proved in 1969 that a single-layer architecture is not enough to build a universal approximating machine (see e.g. Minsky, M. & Papert, S. Perceptrons: An Introduction to Computational Geometry, vol. 165 (1969)). That led to the first disappointment in "AI". Which lasted until several major breakthroughs in 1980s: the proof of universal approximating capabilities of multi-layer perceptron by Cybenko, the popularisation of backpropagation algorithm (Hinton and colleagues), etc.

I agree with LeCun that using continuous activation functions have enabled the backpropagation algorithm at the time. It is only recently that we have learned to backpropagate in networks with binary activation functions (2016!).

penkovsky
  • 264
  • 1
  • 10
  • Thank you for your answer! LeCunn also says in the same lecture that we can approximate any function with an SVM, is it related to the universal approximating machine you mentioned ? And do you perhaps know in which article it was proved that we can approximate any function with an SVM please ? And that's crazy how it's until recently researchers found a method to train networks with binary activation functions. Like it was theoretically known that backpropagation works with discontinuous functions but we weren't able to do so in practice. – Daviiid Mar 13 '21 at 01:01
  • 1
    He probably refers to the Universal approximation theorem https://en.wikipedia.org/wiki/Universal_approximation_theorem. You may want to read e.g. this article http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.441.7873&rep=rep1&type=pdf – penkovsky Mar 14 '21 at 13:43
  • Thank you for your clarification and for the article. (I tried to cite you penkovsky with the @ but it's not working I'm sorry) – Daviiid Mar 14 '21 at 13:58
  • Happy to help. You may want to accept the answer if you find it satisfying. – penkovsky Mar 14 '21 at 13:59
  • 1
    Yes that's right, I always forget to do it. Thank you ! – Daviiid Mar 16 '21 at 00:01