0

I've been trying to implement my own neural network library and have been wondering if:

  • The SSE loss function includes the summation of the errors in the other training examples of the mini-batch (each training example's loss in the mini-batch is summed for one big loss)

  • The MSE loss function averages the loss of each individual training example in the mini-batch, and then all those losses are averaged based on the mini-batch size.

or if the summing or averaging of the weight gradients has nothing to do with the loss function used?

I feel like the answer would be clear if I knew other loss functions, if it did matter that would mean the weight gradients should be averaged for MSE and summed for SSE in mini-batch gradient descent?

1 Answers1

1

What you stated looks correct :-

The SSE loss function includes the summation of the errors in the other training examples of the mini-batch (each training example's loss in the mini-batch is summed for one big loss) SSE = Summation of (y-y')squared for all samples in mini-batch

The MSE loss function averages the loss of each individual training example in the mini-batch, and then all those losses are averaged based on the mini-batch size. MSE = (1/m)SSE

  • You can format the formulas and math symbols in this answer with latex. I would recommend that you do that. – nbro Jan 31 '22 at 12:36