In the update rule of RMSprop, do we divide by a matrix?

Question

I've been trying to understand RMSprop for a long time, but there's something that keeps eluding me.

Here is a screenshot from this video by Andrew Ng.

From the element-wise comment, from what I understand, $dW$ and $db$ are matrices, so that must mean that $S_{dW}$ is a matrix (or tensor) as well.

So, in the update rule, do they divide a matrix by another matrix? From what I saw on google, no such action exists.

score 1 · Answer 1 · answered Dec 18 '20 at 01:05

Yes, you are correct, $S_{dW}$, in this case, is the matrix/tensor and it will have the same shape as gradient matrix/tensor. The update rule divides each element in gradient to each gradient in $S_{dW}$ or you can say "element-wise division".

To avoid the zero variable in $S_{dW}$, the equation usually like this:

$$ W=W-\alpha \frac{dW}{\sqrt{S_{dW}+\epsilon}} \text{ where } \epsilon=10^{-8} $$

If you want to implement it manually with Python, you can take a look at this link

In the update rule of RMSprop, do we divide by a matrix?

1 Answers1