0

i'm trying to learn how a neural network works. i'm writing the neural network in C for handwritten digit recognition and training it on MNIST dataset. the neural network has an input layer with 28*28 neurons, an hidden layer with 50 neurons, and an output layer with 10 neurons. the activation function is sigmoid. weights and biases are initialized randomly between -0.5 and +0.5. i tried implementing forward propagation and back propagation but i think i'm doing something wrong. the training seems not to work well because the accuracy is not that good and sometimes it decrease while training. the learning rate is set to 0.01 and reducing it to 0.001 didn't change much (it had almost the same behavior) Example:
EPOCH: 0 ACCURACY: 63.180000 CORRECT ANSWERS: 6318
EPOCH: 1 ACCURACY: 63.640000 CORRECT ANSWERS: 6364
EPOCH: 2 ACCURACY: 67.730000 CORRECT ANSWERS: 6773
EPOCH: 3 ACCURACY: 68.530000 CORRECT ANSWERS: 6853
EPOCH: 4 ACCURACY: 71.160000 CORRECT ANSWERS: 7116
EPOCH: 5 ACCURACY: 76.120000 CORRECT ANSWERS: 7612
EPOCH: 6 ACCURACY: 81.080000 CORRECT ANSWERS: 8108
EPOCH: 7 ACCURACY: 81.410000 CORRECT ANSWERS: 8141
EPOCH: 8 ACCURACY: 78.440000 CORRECT ANSWERS: 7844
EPOCH: 9 ACCURACY: 77.910000 CORRECT ANSWERS: 7791
EPOCH: 10 ACCURACY: 82.070000 CORRECT ANSWERS: 8207

Here is the back propagation function:

void BackPropagation(double *input,double *output){
double hidden_gradients[NUMHIDDEN];
double output_gradients[NUMOUTPUTS];
    
    for(int i=0;i<NUMOUTPUTS;i++)
        output_gradients[i]=(output[i]-output_layer[i])*dSigmoid(output_layer[i]);

    for(int i=0;i<NUMOUTPUTS;i++){
        for(int j=0;j<NUMHIDDEN;j++)
            output_layer_weights[i][j]+=LR*output_gradients[i]*hidden_layer[j];
        output_layer_bias[i]+=LR*output_gradients[i];
    } 

    for(int i=0;i<NUMHIDDEN;i++){
        double sum=0.0;
        for(int j=0;j<NUMOUTPUTS;j++)
            sum+=output_gradients[j]*output_layer_weights[j][i];
        hidden_gradients[i]=dSigmoid(hidden_layer[i])*sum;
    }

    for(int i=0;i<NUMHIDDEN;i++){
        for(int j=0;j<NUMINPUTS;j++)
            hidden_layer_weights[i][j]+=LR*hidden_gradients[i]*input[j];
        hidden_layer_bias[i]+=LR*hidden_gradients[i];
    }

}

i'll leave here also the forward propagation function:

void ForwardPropagation(double *input){

    for(int i=0;i<NUMHIDDEN;i++){
        hidden_layer[i]=hidden_layer_bias[i];
        for(int j=0;j<NUMINPUTS;j++)
            hidden_layer[i]+=input[j]*hidden_layer_weights[i][j];
        hidden_layer[i]=sigmoid(hidden_layer[i]);
    }

    for(int i=0;i<NUMOUTPUTS;i++){
        output_layer[i]=output_layer_bias[i];
        for(int j=0;j<NUMHIDDEN;j++)
            output_layer[i]+=hidden_layer[j]*output_layer_weights[i][j];
        output_layer[i]=sigmoid(output_layer[i]);
    }
}

I would be grateful if someone could help me. Thanks

Faby
  • 9
  • 1
  • 82% seems a good accuracy, what is the issue ? Just do more epochs . – Lelouch Jul 14 '23 at 13:51
  • is it normal that the accuracy decrease sometimes? because I was wondering if my implementation was wrong, because I tried also a version i found online and the accuracy was never decreasing – Faby Jul 14 '23 at 14:05
  • It can happen as the training is usually done in a stochastic way, and you are not optimizing on a smooth surface either. However it should increase on average otherwise you probably have an error or a learning rate which lead to divergence... Also here you are only given the accuracy, not the loss. The network is only aware of the loss, not accuracy. – Lelouch Jul 14 '23 at 14:08
  • ok thanks, I will try with more epochs and I will try to compute the loss – Faby Jul 14 '23 at 14:12
  • You already have the loss otherwise how could you train it ? – Lelouch Jul 14 '23 at 14:18
  • oh yes, it's `target_output[i]-output_layer[i]` right? that should be only for one input image, the total loss is the sum of the loss of every input? – Faby Jul 14 '23 at 14:38
  • Usually we look at the mean loss , so that the loss value does not depend on batch size or epoch size – Lelouch Jul 14 '23 at 17:58
  • i reviewed the code again and after finding some errors, (for example computing the activations instantly in the forward propagation, instead of when i needed them), i was able to get ~95% accuracy on test inputs and ~96% accuracy on training inputs. thanks again for the help – Faby Jul 14 '23 at 19:03

0 Answers0