Why would my neural network have either an accuracy of 90% or 10% on the validation data, given a random initialization?

Question

I'm making a custom neural network framework (in C++, if that is of any help). When I train the model on MNIST, depending on how happy the network is feeling, it'll give me either 90%+ accuracy, or get stuck at 10-9% (on validation set).

I shuffle all my data before feeding it to the neural net.

Is there a better randomizer I should be using, or maybe I am not initializing my weights properly (Using srand to generate values between +/-0.1). Did I somehow hit a saddle point?

My network consists of 784 size input layer, 256, 64, 32, 16 neuron hidden layers, all with RELU, and 10 output with SMAX

Where should I start investigating based on this kind of behavior, when I can't even replicate what is going on?

This could be due to exploding gradients maybe. I'd suggest you to train with gradient clipping and see how the model learns. — SpiderRico, Mar 04 '21 at 21:56
@SpiderRico wouldn't exploding gradients produce nan outputs (and also cause nan loss)? My loss and acc are both normal looking values, but I have not checked what the gradient looks like so I will look into it, thank you — Ilknur Mustafa, Mar 04 '21 at 21:58
Correct, loss should be NaN but you mentioned accuracy getting stuck at 10%. This is equivalent to random guessing for MNIST (as there are 10 classes), and can be caused by exploding grads. — SpiderRico, Mar 04 '21 at 21:59
How are you optimising your network? If you're using some form of gradient descent, is the learning rate you're using reasonable? — htl, Mar 05 '21 at 09:23
If you're using srand for a number between +-0.1, you should verify that none of your weights are initialized to zero. If they are, they will stay there. — David Hoelzer, Mar 05 '21 at 11:41

Why would my neural network have either an accuracy of 90% or 10% on the validation data, given a random initialization?

0 Answers0