How fast is TensorFlow compared to self written neural nets?

Question

I made my first neural net in C++ without any libraries. It was a net to recognize numbers from the MNIST dataset. In a 784 - 784 - 10 net with sigmoid function and 5 epochs with every 60000 samples, it took about 2 hours to train. It was probably slow anyways, because I trained it on a laptop and I used classes for Neurons and Layers.

To be honest, I've never used TensorFlow, so I wanted to know how the performance of my net would be compared to the same in TensorFlow. Not too specific but just a rough approximation.

icc97 · Accepted Answer · 2018-10-01T09:51:46.657

I wanted to know how the performance of my net would be compared to the same in Tensor Flow. Not to specific but just a rough aproximation.

This is very hard to answer in specific terms because benchmarking is very hard and is often wrong.

The main point of TensorFlow as I see it is to make it easier for you to use a GPU and further allows you to use a large supply of programs written in Python/JavaScript that still give C++ level performance.

How fast is TensorFlow compared to self written neural nets?

This is answering the general question of using TensorFlow/PyTorch vs a custom solution, rather than your specific question of how much of a speed up you'd get.

There was a relatively recent MIT paper, Differentiable Programming for Image Processing and Deep Learning in Halide trying to discuss the performance vs flexibility vs time spent coding a solution of 3 different languages.

Specifically they compared a solution in their language Halide vs PyTorch vs CUDA.

Consider the following example. A recent neural network-based image processing approximation algorithm was built around a new “bilateral slicing” layer based on the bilateral grid [Chen et al. 2007; Gharbi et al. 2017]. At the time it was published, neither PyTorch nor TensorFlow was even capable of practically expressing this computation. As a result, the authors had to define an entirely new operator, written by hand in about 100 lines of CUDA for the forward pass and 200 lines more for its manually-derived gradient (Fig. 2, right). This was a sizeable programming task which took significant time and expertise. While new operations—added in just the last six months before the submission of this paper—now make it possible to implement this operation in 42 lines of PyTorch, this yields less than 1/3rd the performance on small inputs and runs out of memory on realistically-sized images (Fig. 2, middle). The challenge of efficiently deriving and computing gradients for custom nodes remains a serious obstacle to deep learning.

So in general you'll probably get faster performance with TensorFlow/PyTorch than a custom C++ implementation, but for specific cases if you have CUDA knowledge on top of C++ then you will be able to write more performant programs.

score 3 · Answer 2 · answered Sep 30 '18 at 18:08

3

A lot. There are all these optimizations that we might not have thought of like combining layers, functions, etc. I am a pytorch guy though, its clean and doesn't get in your way like tensorflow does.

answered Sep 30 '18 at 18:08

Rajat Pundir

72
2

How fast is TensorFlow compared to self written neural nets?

2 Answers2