Could a large number of interconnected tiny turing-complete computer chips be patterned across a wafer to simulate a programmable neural network?

Question

The Intel 8080 had 4500 transistors and ran at 2-3.125 MHz. By comparison, the 18-core Xeon Haswell-E5 han 5,560,000,000 transistors and can run at 2 GHz. Would it be possible or prudent to simulate a neural network by backing a chip chock-full of a million interconnected, slightly modified intel 8080s (sped up to run at 2 GHz)? If each one modeled 100 neurons you could simulate a neural network with 100 million neurons on a single chip.

Edit: I'm not proposing that you actually use a million intel 8080s; rather I'm proposing that you take a highly minimal programmable chip design like the intel 8080's design and pattern it across a wafer as densely as possible with interconnects so that each instance can function as one or a few dozen fully programmable neurons each with a small amount of memory. I'm not proposing that someone take a million intel 8080s and hook them together.

score 1 · Answer 1 · answered Jan 08 '18 at 10:04

The building unit of a neural network is called perceptron. It cannot be represented by single transistor because it should hold arbitrary (float) value, over multiple computational iterations. (While the transistor is only binary, and does not work as memory on its own.)

Furthermore, the strengths of the NN is in it's flexibility, which you would lose if you were to bake it on silicon. In a NN you can vary the:

number of layers
connections between units
activation functions
and many, many more meta parameters

The NNs, once trained on a particular problem, are really fast to make a prediction for a new sample. The slow and computationally heavy task is the training - and it's during the training that you need flexibility to mess with the model and the parameters.

You could bake a trained NN model on a chip, if you need the computation time of a prediction to be really fast i.e. in order of nanoseconds (instead of a millisecond or a second on a modern CPU). That will have a significant downside - you won't be able to ever update it with newer NN model.

I'm not proposing that a neural network be represented as a network of individual transistors. I'm proposing that it's represented with a massively parallel architecture based on linking together hundreds of thousands of ulta-light-weight chips capable of universal computation, so that they can be repurposed to simulate any type of neuron or any connection architecture that one might desire. — Alecto Irene Perez, Jan 09 '18 at 00:31
@JorgePerez it's pretty much what you get with CUDA cores in modern GPU - highly parallel individual computational units. It differs from a full CPU core, as a full core has overhead features. A processor would have individual registers and a way to address the whole memory, while with CUDA you it's assumed you execute the same commutation on all cores, thus the memory is shared. — Iliyan Bobev, Jan 09 '18 at 07:10

score 0 · Answer 2 · answered Jan 21 '19 at 00:51

0

It's been done (essentially). This guy at the following link has used a series of FPGAs to emulate hundreds of 8080s, using them to train a neural network to play Gameboy games. https://towardsdatascience.com/a-gameboy-supercomputer-33a6955a79a4

IBM's True North being used in Darpa's SyNAPSE program is also very close to what you suggest. https://en.wikipedia.org/wiki/TrueNorth

Also of interest may be SpiNNaker and Intel Loihi.

answered Jan 21 '19 at 00:51

Charles Niswander

1
1

Interesting experiment. Although it demonstrates the architecture is practical and useful for some purposes, it isn't solving the problem that the OP has asked about - making a large NN out of lots of 8080s. A lot of the answers here touch on the problem with that (namely the requirement for shared data, ruling out a set of full processors that communicate), perhaps you could extend this one to capture the difference between OP's proposal and practical deep learning chips. – Neil Slater Jan 22 '19 at 07:57

score 0 · Answer 3 · answered Jan 08 '18 at 09:15

Theoretically it might be possible but practically it is not.

You can argue by using the analogy of a Turing machine. You can say that the Intel 8080 is a turing machine hence it can run any program including a neural network given infinite time and memory.

Inspite of the above you will face insurmountable challenges in implementing your system.

CPU's are designed to handle calculations in a sequential manner, most AI algorithms are distributed. You need a GPU (or an AI ASIC) to process the algorithms in a massively parallel manner for a significant speedup.

Additionally GPU's are excellent at floating point math, floating point arithmetic involves numbers with a variable number of decimal places which are key in running neural networks. For example an Intel core i7 6700k is capable of 200 Giga-FLOPs (floating point operations per seconds) while on the other hand an NVidia GTX 1080 GPU is capable of about 8900 Giga-FLOPs which is a significant difference. (Tyler J 2017)

If you decide to use the intel 8080 (0.290 MIPS at 2.000 MHz), you will require millions of processors and billions of dollars just to compute at one gigaflop. You can follow this link to see the cost of computing over the years https://en.wikipedia.org/wiki/FLOPS

Another problem concerns RAM. To efficiently run a neural network you need to fully load it in RAM. It will be a huge challenge to squeeze a neural network in the 64 Kb of RAM that an Intel 8080 processor offers.

The network bandwidth problem will also be a huge bottleneck. Modern GPU's support high speed technology to communicate between the GPU's. For example NVidia's NVLink has a peak speed of around 80 GBps. While PCI-E 3.0 runs at around 30 GBps. Without a high speed interconnection bandwidth you will not achieve any speedup inspite of using a distributed system with many processors.

Additionally you will face significant challenges in programing neural network algorithms for your 8080 processor based system. Most programmers today follow the standards of object oriented programming which enables code reuse, simplified design and maintanance. Besides, OOP languages such as Java, C++ and Python have libraries that significantly simplify the process of programming a neural network.

When the 8080 processor was designed back in 1974 OOP had not yet been concieved, they were also using programming tools i.e. compilers that would be considered archaic with todays standards. I mean good luck debugging that system.

Last but not least, you need big data (or atleast a substantial dataset) to train your neural network on. Without training on a big data set your model will be ineffective. The 8080 supported around 200 Kb of storage. For comparison the MNIST dataset is around 14 GB in size. This means that your processor cannot support the neccessary storage of any ML dataset.

For the above reasons my conclusion is that the 8080 processor provides insufficient resources necessary to implement any effective DL algorithm. Networking millions of them together will not provide any substantial speedup for a DL algorithm.

Could a large number of interconnected tiny turing-complete computer chips be patterned across a wafer to simulate a programmable neural network?

3 Answers3