i'm trying to learn how a neural network works. i'm writing the neural network in C for handwritten digit recognition and training it on MNIST dataset. the neural network has an input layer with 28*28 neurons, an hidden layer with 50 neurons, and an output layer with 10 neurons. the activation function is sigmoid. weights and biases are initialized randomly between -0.5 and +0.5. i tried implementing forward propagation and back propagation but i think i'm doing something wrong. the training seems not to work well because the accuracy is not that good and sometimes it decrease while training. the learning rate is set to 0.01 and reducing it to 0.001 didn't change much (it had almost the same behavior)
Example:
EPOCH: 0 ACCURACY: 63.180000 CORRECT ANSWERS: 6318
EPOCH: 1 ACCURACY: 63.640000 CORRECT ANSWERS: 6364
EPOCH: 2 ACCURACY: 67.730000 CORRECT ANSWERS: 6773
EPOCH: 3 ACCURACY: 68.530000 CORRECT ANSWERS: 6853
EPOCH: 4 ACCURACY: 71.160000 CORRECT ANSWERS: 7116
EPOCH: 5 ACCURACY: 76.120000 CORRECT ANSWERS: 7612
EPOCH: 6 ACCURACY: 81.080000 CORRECT ANSWERS: 8108
EPOCH: 7 ACCURACY: 81.410000 CORRECT ANSWERS: 8141
EPOCH: 8 ACCURACY: 78.440000 CORRECT ANSWERS: 7844
EPOCH: 9 ACCURACY: 77.910000 CORRECT ANSWERS: 7791
EPOCH: 10 ACCURACY: 82.070000 CORRECT ANSWERS: 8207
Here is the back propagation function:
void BackPropagation(double *input,double *output){
double hidden_gradients[NUMHIDDEN];
double output_gradients[NUMOUTPUTS];
for(int i=0;i<NUMOUTPUTS;i++)
output_gradients[i]=(output[i]-output_layer[i])*dSigmoid(output_layer[i]);
for(int i=0;i<NUMOUTPUTS;i++){
for(int j=0;j<NUMHIDDEN;j++)
output_layer_weights[i][j]+=LR*output_gradients[i]*hidden_layer[j];
output_layer_bias[i]+=LR*output_gradients[i];
}
for(int i=0;i<NUMHIDDEN;i++){
double sum=0.0;
for(int j=0;j<NUMOUTPUTS;j++)
sum+=output_gradients[j]*output_layer_weights[j][i];
hidden_gradients[i]=dSigmoid(hidden_layer[i])*sum;
}
for(int i=0;i<NUMHIDDEN;i++){
for(int j=0;j<NUMINPUTS;j++)
hidden_layer_weights[i][j]+=LR*hidden_gradients[i]*input[j];
hidden_layer_bias[i]+=LR*hidden_gradients[i];
}
}
i'll leave here also the forward propagation function:
void ForwardPropagation(double *input){
for(int i=0;i<NUMHIDDEN;i++){
hidden_layer[i]=hidden_layer_bias[i];
for(int j=0;j<NUMINPUTS;j++)
hidden_layer[i]+=input[j]*hidden_layer_weights[i][j];
hidden_layer[i]=sigmoid(hidden_layer[i]);
}
for(int i=0;i<NUMOUTPUTS;i++){
output_layer[i]=output_layer_bias[i];
for(int j=0;j<NUMHIDDEN;j++)
output_layer[i]+=hidden_layer[j]*output_layer_weights[i][j];
output_layer[i]=sigmoid(output_layer[i]);
}
}
I would be grateful if someone could help me. Thanks