0

I'm making a custom neural network framework (in C++, if that is of any help). When I train the model on MNIST, depending on how happy the network is feeling, it'll give me either 90%+ accuracy, or get stuck at 10-9% (on validation set).

I shuffle all my data before feeding it to the neural net.

Is there a better randomizer I should be using, or maybe I am not initializing my weights properly (Using srand to generate values between +/-0.1). Did I somehow hit a saddle point?

My network consists of 784 size input layer, 256, 64, 32, 16 neuron hidden layers, all with RELU, and 10 output with SMAX

Where should I start investigating based on this kind of behavior, when I can't even replicate what is going on?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 1
    This could be due to exploding gradients maybe. I'd suggest you to train with gradient clipping and see how the model learns. – SpiderRico Mar 04 '21 at 21:56
  • @SpiderRico wouldn't exploding gradients produce nan outputs (and also cause nan loss)? My loss and acc are both normal looking values, but I have not checked what the gradient looks like so I will look into it, thank you – Ilknur Mustafa Mar 04 '21 at 21:58
  • 2
    Correct, loss should be NaN but you mentioned accuracy getting stuck at 10%. This is equivalent to random guessing for MNIST (as there are 10 classes), and can be caused by exploding grads. – SpiderRico Mar 04 '21 at 21:59
  • How are you optimising your network? If you're using some form of gradient descent, is the learning rate you're using reasonable? – htl Mar 05 '21 at 09:23
  • If you're using srand for a number between +-0.1, you should verify that none of your weights are initialized to zero. If they are, they will stay there. – David Hoelzer Mar 05 '21 at 11:41

0 Answers0