1

Suppose the following is the neural network I want to train and assume that there is a batch normalization layer for each layer of the neural network

enter image description here.

My focus is on the activity of batch normalization layer of the hidden layer. Assume that mini-batch size is $3$. The following are the first four outputs of the hidden layer without using batch normalization layer.

$$\left(\begin{array}{c} 4 \\ 5 \\ 2 \\ 1 \\ \end{array}\right), \left(\begin{array}{c} 3 \\ 4 \\ 6 \\ 0 \\ \end{array}\right), \left(\begin{array}{c} 1 \\ 4 \\ 7 \\ 9 \\ \end{array}\right), \left(\begin{array}{c} 3 \\ 5 \\ 7 \\ 9 \\ \end{array}\right)$$

So the first mini-batch is$\left(\begin{array}{c} 4 \\ 5 \\ 2 \\ 1 \\ \end{array}\right), \left(\begin{array}{c} 3 \\ 4 \\ 6 \\ 0 \\ \end{array}\right), \left(\begin{array}{c} 1 \\ 4 \\ 7 \\ 9 \\ \end{array}\right)$ and has the following statistics

mean = $\mathbb{E}[X] = \left(\begin{array}{c} \dfrac{8}{3} \\ \dfrac{13}{3} \\ 5 \\ \dfrac{10}{3} \\ \end{array}\right), Var(X) = \left(\begin{array}{c} \dfrac{14}{9} \\ \dfrac{2}{9} \\ \dfrac{14}{3} \\ \dfrac{146}{9} \\ \end{array}\right)$

So, I am thinking that the batch normalization can be applied during from fourth iteration since mini-batch of outputs of hidden layer are available and so fourth vector $\left(\begin{array}{c} 3 \\ 5 \\ 7 \\ 9 \\ \end{array}\right)$ will be mapped to $\left(\begin{array}{c} 0.26 \\ 1.41 \\ 0.92 \\ 1.4 \\ \end{array}\right)$

Is it true that if we apply batch normalization the outputs of neural network are $$\left(\begin{array}{c} 4 \\ 5 \\ 2 \\ 1 \\ \end{array}\right), \left(\begin{array}{c} 3 \\ 4 \\ 6 \\ 0 \\ \end{array}\right), \left(\begin{array}{c} 1 \\ 4 \\ 7 \\ 9 \\ \end{array}\right),\left(\begin{array}{c} 0.26 \\ 1.41 \\ 0.92 \\ 1.4 \\ \end{array}\right)$$

I am thinking that I may be totally wrong because the batch normalization layer is not active till the completion of first mini-batch and also using the stats of first mini batch on the outputs of next mini-batch. If wrong, I want to know what is wrong or a simple numerical example by at-least running the batch normalization layer of neural network for mini-batch + 1 number of iterations.

hanugm
  • 3,571
  • 3
  • 18
  • 50

0 Answers0