3

I have computed the forward and backward passes of the following simple neural network, with one input, hidden, and output neurons.

enter image description here

Here are my computations of the forward pass.

\begin{align} net_1 &= xw_{1}+b \\ h &= \sigma (net_1) \\ net_2 &= hw_{2}+b \\ {y}' &= \sigma (net_2), \end{align}

where $\sigma = \frac{1}{1 + e^{-x}}$ (sigmoid) and $ L=\frac{1}{2}\sum(y-{y}')^{2} $

Here are my computations of backpropagation.

\begin{align} \frac{\partial L}{\partial w_{2}} &=\frac{\partial net_2}{\partial w_2}\frac{\partial {y}' }{\partial net_2}\frac{\partial L }{\partial {y}'} \\ \frac{\partial L}{\partial w_{1}} &= \frac{\partial net_1}{\partial w_{1}} \frac{\partial h}{\partial net_1}\frac{\partial net_2}{\partial h}\frac{\partial {y}' }{\partial net_2}\frac{\partial L }{\partial {y}'} \end{align} where \begin{align} \frac{\partial L }{\partial {y}'} & =\frac{\partial (\frac{1}{2}\sum(y-{y}')^{2})}{\partial {y}'}=({y}'-y) \\ \frac{\partial {y}' }{\partial net_2} &={y}'(1-{y}')\\ \frac{\partial net_2}{\partial w_2} &= \frac{\partial(hw_{2}+b) }{\partial w_2}=h \\ \frac{\partial net_2}{\partial h} &=\frac{\partial (hw_{2}+b) }{\partial h}=w_2 \\ \frac{\partial h}{\partial net_1} & =h(1-h) \\ \frac{\partial net_1}{\partial w_{1}} &= \frac{\partial(xw_{1}+b) }{\partial w_1}=x \end{align}

The gradients can be written as

\begin{align} \frac{\partial L }{\partial w_2 } &=h\times {y}'(1-{y}')\times ({y}'-y) \\ \frac{\partial L}{\partial w_{1}} &=x\times h(1-h)\times w_2 \times {y}'(1-{y}')\times ({y}'-y) \end{align}

The weight update is

\begin{align} w_{i}^{t+1} \leftarrow w_{i}^{t}-\alpha \frac{\partial L}{\partial w_{i}} \end{align}

Are my computations correct?

nbro
  • 39,006
  • 12
  • 98
  • 176
Eka
  • 1,036
  • 8
  • 23
  • My review concludes that your analysis is correct. Congratulations also for the presentation. Just a minor editorial: replace "b" by b1 and b2; must be b1 and b2 also evaluated ?; clarify that sigma function is sigmoid; and better write w_2*h that h*w_2 (in this way, most of your equations are applicable to h vector and W matrix). – pasaba por aqui Mar 12 '18 at 09:09
  • @pasabaporaqui Thank you for reviewing my calcultions and yes sigma is sigmoid function. My biggest doubt is when I calculated `dnet_2/dh=w_2`. It was a surprise for me, I never thought we use weights value to do back prop calculation. I didnt understand this part `better write w_2*h that h*w_2 (in this way, most of your equations are applicable to h vector and W matrix)` ? – Eka Mar 12 '18 at 15:32
  • @Eka: if the value of h is increased in 1 unit, the value of net2 is increased by w2. This is the meaning of this partial derivate. – pasaba por aqui Mar 12 '18 at 16:00
  • @Eka: do not worry about the comnent of order, it is just that is more practical/traditional write w*h than h*w. In some problems, W will be a matrix and h a vector, Wh is ok, hW is not. – pasaba por aqui Mar 12 '18 at 16:02

1 Answers1

2

One important point I missed in first review: error is a summatory, its derivative is also a summatory.

About offsets "b": usually they are different in each cell (if not fixed to some value, as 0). Thus, replace them by b1 and b2. Moreover, they should be optimized in the same way that the weights.

pasaba por aqui
  • 1,282
  • 6
  • 21