In backpropagation the gradients are used to update the weights using the formula $$w = w - \alpha \frac{dL}{dw}$$ and the loss gradient w.r.t. weights is $$\frac{dL}{dw} = \frac{dL}{dz} \frac{dz}{dw} = (\frac{dL}{da} \frac{da}{dz} \frac{1}{m}) \frac{dz}{dw} $$
Why is the there the $\frac{1}{m}$ term? Does batch size matter and what if it's 1?