Implementing momentum is causing calculation exceptions/errors

Question

I am developing my own neural network in order to learn about how they work. I am implementing via C++ and the Eigen library (for matrix multiplication). I have a working implementation that seems to train on MNIST quite nicely.

The next thing I wanted to implement was momentum. I believe that I have implemented the algorithm here but when I run (with momentum factor = 0.9) after a couple of seconds (something like 100 minibatches on MNIST) Eigen spits out a NaN error on the feedforward.

I have looked at the weight values at the point just before the NaN error spits out and some of the values look very big and very small so I am thinking that it is generating something like an INF value which then causes a NaN? For matrix multiplication, is the only thing which generates NaN an INF result?

Thinking about it more, doesn't the momentum formula lead to effectively summing all of the previous gradients?

Edit: I realise exactly the same question has been asked (but not answered) here.

please post the c++ code... also, please use a very basic model, like linear regression, so that you know it's not some weird exploding gradient problem — Alberto, Aug 08 '23 at 21:27
Please upload the code. Besides that, if there is a implementation problem, please upload it to StackOverflow. — CuCaRot, Aug 09 '23 at 04:32

Implementing momentum is causing calculation exceptions/errors

0 Answers0