How does using ASIC for the acceleration of AI work?

Question

We can read on Wikipedia page that Google built a custom ASIC chip for machine learning and tailored for TensorFlow which helps to accelerate AI.

Since ASIC chips are specially customized for one particular use without the ability to change its circuit, there must be some fixed algorithm which is invoked.

So how exactly does the acceleration of AI using ASIC chips work if its algorithm cannot be changed? Which part of it is exactly accelerating?

No mainstream AI technique that I'm aware of requires modification of the *algorithm*, though most are dependent on the ability to modify *data* (connection strengths, population members) etc. — NietzscheanAI, Aug 17 '16 at 12:00
So the only dynamic part like state of network is kept on some flash memory or drive? — kenorb, Aug 17 '16 at 12:13
According to https://en.wikipedia.org/wiki/Application-specific_integrated_circuit, modern ASICs can have RAM... — NietzscheanAI, Aug 17 '16 at 12:16

score 5 · Accepted Answer · answered Aug 24 '16 at 10:22

5

Tensor operations

The major work in most ML applications is simply a set of (very large) tensor operations e.g. matrix multiplication. You can do that easily in an ASIC, and all the other algorithms can just run on top of that.

answered Aug 24 '16 at 10:22

Peteris

883
5
8

1

An important point is that the TPU uses 8 bit multiplication, which can be implemented much more efficiently than wider multiplication offered by the CPU. Such a low precision is sufficient and allows to pack many thousands of such multipliers on a single chip. – maaartinus Mar 27 '18 at 01:10

score 3 · Answer 2 · answered Aug 23 '16 at 19:35

I think the algorithm has changed minimally, but the necessary hardware has been trimmed to the bone.

The number of gate transitions are reduced (perhaps float ops and precision too), as are the number of data move operations, thus saving both power and runtime. Google suggests their TPU achieves a 10X cost saving to get the same work done.

https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html

score 2 · Answer 3 · edited Mar 31 '21 at 22:25

ASIC - It stands for Application-specific integrated circuit. Basically, you write programs to design a chip in HDL. I'll take cases of how modern computers work to explain my point:

CPU's - CPU's are basically a microprocessor with many helper IC's performing specific tasks. In a microprocessor, there is only a single Arithmetic Processing unit (made up term) called Accumulator in which a value has to be stored, as computations are performed only and only the values stored in the accumulator. Thus every instruction, every operation, every R/W operation has to be done through the accumulator (that is why older computers used to freeze when you wrote from a file to some device, although nowadays the process has been refined and may not require accumulator to come in-between specifically DMA). Now in ML algorithms, you need to perform matrix multiplications which can be easily parallelized, but we have in our has a single processing unit only and so came the GPU's.
GPU's - GPU's have 100's processing units but they lack the multipurpose facilities of a CPU. So they are good for parallelizable calculations. Since there is no memory overlapping (same part of the memory being manipulated by 2 processes) in matrix multiplication, GPU's will work very well. Though since GPU is not multi-functional it will work only as fast as a CPU feeds data into its memory.
ASIC - ASIC can be anything a GPU, CPU or a processor of your design, with any amount of memory you want to give to it. Let' say you want to design your own specialized ML processor, design a processor on ASIC. Do you want a 256-bit FP number? Create a 256-bit processor. You want your summing to be fast? Implement a parallel adder up to a higher number of bits than conventional processors? You want n number of cores? No problem. you want to define the data-flow from different processing units to different places? You can do it. Also with careful planning, you can get a trade-off between ASIC area vs power vs speed. The only problem is that for all of this you need to create your own standards. Generally, some well-defined standards are followed in designing processors, like a number of pins and their functionality, IEEE 754 standard for floating-point representation, etc which have been come up after lots of trial and errors. So if you can overcome all of these you can easily create your own ASIC.

I do not know what Google is doing with their TPU's but apparently, they designed some sort of Integer and FP standard for their 8-bit cores depending on the requirements at hand. They probably are implementing it on ASIC for power, area and speed considerations.

score 0 · Answer 4 · answered Sep 19 '19 at 19:11

0

Low precision enables high parallelism computation in Convo and FC layers. CPU & GPU fixed architecture, but ASIC/FPGA can be designed based on neural network architecture

answered Sep 19 '19 at 19:11

mahinlma

101

How does using ASIC for the acceleration of AI work?

4 Answers4

Tensor operations