ASIC - It stands for Application-specific integrated circuit. Basically, you write programs to design a chip in HDL. I'll take cases of how modern computers work to explain my point:
- CPU's - CPU's are basically a microprocessor with many helper IC's performing specific tasks. In a microprocessor, there is only a single Arithmetic Processing unit (made up term) called Accumulator in which a value has to be stored, as computations are performed only and only the values stored in the accumulator. Thus every instruction, every operation, every R/W operation has to be done through the accumulator (that is why older computers used to freeze when you wrote from a file to some device, although nowadays the process has been refined and may not require accumulator to come in-between specifically DMA).
Now in ML algorithms, you need to perform matrix multiplications which can be easily parallelized, but we have in our has a single processing unit only and so came the GPU's.
- GPU's - GPU's have 100's processing units but they lack the multipurpose facilities of a CPU. So they are good for parallelizable calculations. Since there is no memory overlapping (same part of the memory being manipulated by 2 processes) in matrix multiplication, GPU's will work very well. Though since GPU is not multi-functional it will work only as fast as a CPU feeds data into its memory.
- ASIC - ASIC can be anything a GPU, CPU or a processor of your design, with any amount of memory you want to give to it. Let' say you want to design your own specialized ML processor, design a processor on ASIC. Do you want a 256-bit FP number? Create a 256-bit processor. You want your summing to be fast? Implement a parallel adder up to a higher number of bits than conventional processors? You want
n
number of cores? No problem. you want to define the data-flow from different processing units to different places? You can do it. Also with careful planning, you can get a trade-off between ASIC area vs power vs speed. The only problem is that for all of this you need to create your own standards. Generally, some well-defined standards are followed in designing processors, like a number of pins and their functionality, IEEE 754 standard for floating-point representation, etc which have been come up after lots of trial and errors. So if you can overcome all of these you can easily create your own ASIC.
I do not know what Google is doing with their TPU's but apparently, they designed some sort of Integer and FP standard for their 8-bit cores depending on the requirements at hand. They probably are implementing it on ASIC for power, area and speed considerations.