I am purchasing Titan RTX GPU. Everything seems fine with that except float32 & float64 performance which seems lower vis-a-vis some of its counter parts. I wanted to understand if single precision and double precision performance of GPU affect deep learning training or efficiency ? We work mostly with images, however not limited to that.
2 Answers
First off I would like to post this comprehensive blog which makes comparison between all kinds of NVIDIA GPU's.
The most popular deep learning library TensorFlow by default uses 32 bit floating point precision. The choice is made as it helps in 2 causes:
- Lesser memory requirements
- Faster calculations
64 bit is only marginally better than 32 bit as very small gradient values will also be propagated to the very earlier layers. But the trade-off for the gain in performance vs (the time for calculations + memory requirements + time for running through so many epochs so that those small gradients actually do something) is not worth it. There are state of art CNN architectures, which insert gradients midpoint and has very good performance.
So overall 32 bit performance is the one which should really matter for deep learning, unless you are doing a very very high precision job (which still would hardly matter as small differences due to 64 bit representation is literally erased by any kind of softmax or sigmoid). So 64 bit might increase your accuracy classification by $<< 1 {\%}$ and will only become significant over very large datasets.
As far as raw specs go the TITAN RTX in comparison to 2080Ti, TITAN will perform better than 2080Ti in fp64 (as its memory is double than 2080Ti and has higher clock speeds, BW, etc) but a more practical approach would be to use 2 2080Ti's coupled together, giving a much better performance for price.
Side Note: Good GPU's require good CPU's. It is difficult to tell whether a given CPU will bottleneck a GPU as it entirely depends how the training is being performed (whether data is fully loaded in GPU then training occurs, or continuous feeding from CPU takes place.) Here are a few links explaining the problem: