Does the summing or averaging of the weight gradients have anything to do with the cost function used?

Question

I've been trying to implement my own neural network library and have been wondering if:

The SSE loss function includes the summation of the errors in the other training examples of the mini-batch (each training example's loss in the mini-batch is summed for one big loss)
The MSE loss function averages the loss of each individual training example in the mini-batch, and then all those losses are averaged based on the mini-batch size.

or if the summing or averaging of the weight gradients has nothing to do with the loss function used?

I feel like the answer would be clear if I knew other loss functions, if it did matter that would mean the weight gradients should be averaged for MSE and summed for SSE in mini-batch gradient descent?

seems at least a partial dup of [this](https://ai.stackexchange.com/q/20377/2444). let me know. — nbro, Jan 28 '22 at 15:51
hi, I've seen that one before. I don't think it answers my question. — jake_prentice, Jan 28 '22 at 16:03

score 1 · Accepted Answer · answered Jan 29 '22 at 16:57

1

What you stated looks correct :-

The SSE loss function includes the summation of the errors in the other training examples of the mini-batch (each training example's loss in the mini-batch is summed for one big loss) SSE = Summation of (y-y')squared for all samples in mini-batch

The MSE loss function averages the loss of each individual training example in the mini-batch, and then all those losses are averaged based on the mini-batch size. MSE = (1/m)SSE

answered Jan 29 '22 at 16:57

Vivek Singhal

26
3

You can format the formulas and math symbols in this answer with latex. I would recommend that you do that. – nbro Jan 31 '22 at 12:36

Does the summing or averaging of the weight gradients have anything to do with the cost function used?

1 Answers1