Can I calculate the training performance of GPUs by comparing their specification?

Question

I am currently using Nvidia GTX1050 with 640 CUDA cores and 2GB GDDR5 for Deep Neural Network training. I want to buy a new GPU for training, but I am not sure how much performance improvement I can get.

I wonder if there is a way to roughly calculate the training performance improvement by just comparing GPUs' specification?

Assuming all training parameters are the same. I wonder if I can roughly assume the training performance improvement is X times because the CUDA core number and memory size increased X times?

For example, Is RTX2070 with 2304 CUDA cores and 8GB GDDR6 roughly 4 times faster than GTX1050? And is RTX2080Ti with 4352 CUDA cores and 11GB GDDR6 roughly 7 times faster than GTX1050?

Thanks.

score 2 · Accepted Answer · 2019-04-10T10:39:20.223

2

A lot matters when it comes to comparison of GPU's, I will give you a broad overview of the matter (it is not possible to go into exact details as huge number of factors are actually involved):

Cores - Number of CUDA cores increases means the parallelism has increased, thus multiple calculations can be done in parallel but is of no significance if your algorithm is inherently sequential. Then CUDA cores will not matter. Your library will parrallelize what it can parallelize and will take that many CUDA cores only, the rest will remain idle.
Memory - Memory is useful if you are working on data whose one instance requires huge memory (like pictures). Thus with greater memory you can load greater amount of data at the same time and the cores will process on that. If memory is too low then cores want data but it is not getting it (basically data available in RAM's is the fuel while cores are the engine, you cannot run jets on fuel tank the size of car, it will consume time to constantly refill the fuel). But according to convention of Machine Learning one should load only small mini-batches at a time.
Micro-architecture - Lastly, architecture matters. I do not exactly know how but NVIDIA's RTX is faster for deep learning than GTX. NVIDIA has two affordable architectures (Pascal - GTX and Turing - RTX). Thus even for exactly same specs Turing architecture will run faster for deep learning. But for more details you can explore NVIDIA's website on what architecture specialises in what. For example NVIDIA P-series is good for CAD purposes. Also there are some very high end GPU's using Tesla architecture.

So AFAIK, these are the factors that matter. The library you will be using also matters, as a lot depends on how the library unrolls your program and maps them on several GPU's. Also related these 2 answers I previously gave:

CPU preferences and specifications for a multi GPU deep-learning setup

Does fp32 & fp64 performance of GPU affect deep learning model training?

Hope this helps!

edited Apr 10 '19 at 10:39

answered Apr 08 '19 at 13:32

Thanks for the reply. I wonder how can we know how many CUDA cores are used for a given task? @ DuttaA – Lei Xun Apr 10 '19 at 09:53
@LeiXun I think it depends on the framework and how you write the code. I don't know whether there is some way to see how many cores tensorflow is using at a particular moment so you have to check that out, otherwise you have to check GPU usage by some external software. – Apr 10 '19 at 10:16
Thanks a lot. Few more questions, I wonder what things are stored on the GPU memory while training? e.g. how do I know many images are stored on GPU memory for training? and if this number is affected by the batch size I set? And I wonder if the entire model (i.e. trainable parameters) is stored on the GPU memory? – Lei Xun Apr 10 '19 at 10:43
@LeiXun all these depend on libraries..but of course you cannot exceed the gpu memory limit...and yes all the trainable parameters in ideal case should be stored in GPU's..its tough to say without knowing the workings of the library or the model size – Apr 10 '19 at 10:47
Do you mean libraries like CUDA and cuDNN? or libraries by any other meanings? – Lei Xun Apr 10 '19 at 10:55
@LeiXun libraries like TensorFlow, I think cuda pretty much gives you a lot of opportunities to do what you want (not sure though) but it is up to the libraries to decide how to use them...for example you can explicitly map your TensorFlow model in different GPU's, other than that it does not really provide any other customisation...but you can write your own library (humongous task) or modify TensorFlow..if you want to actually get that level of micro control. In genral it hardly matters, unless you are doing some really large tasks. – Apr 10 '19 at 10:59
@LeiXun in the 2nd link I have linked to a blog which will show you the performance comparison of various gpu's and you'll easily see the number of cores and memory does not linearly scale speed as we like to think. Lots of other factors are involved – Apr 10 '19 at 11:00

Can I calculate the training performance of GPUs by comparing their specification?

1 Answers1