1

I've been trying to understand RMSprop for a long time, but there's something that keeps eluding me.

Here is a screenshot from this video by Andrew Ng.

enter image description here

From the element-wise comment, from what I understand, $dW$ and $db$ are matrices, so that must mean that $S_{dW}$ is a matrix (or tensor) as well.

So, in the update rule, do they divide a matrix by another matrix? From what I saw on google, no such action exists.

nbro
  • 39,006
  • 12
  • 98
  • 176
Uriyasama
  • 11
  • 1

1 Answers1

1

Yes, you are correct, $S_{dW}$, in this case, is the matrix/tensor and it will have the same shape as gradient matrix/tensor. The update rule divides each element in gradient to each gradient in $S_{dW}$ or you can say "element-wise division".

To avoid the zero variable in $S_{dW}$, the equation usually like this:

$$ W=W-\alpha \frac{dW}{\sqrt{S_{dW}+\epsilon}} \text{ where } \epsilon=10^{-8} $$

If you want to implement it manually with Python, you can take a look at this link

CuCaRot
  • 892
  • 3
  • 15