How to deal with random weights initialization in hyperparameters tuning?

Question

In the step of tuning my neural networks I often encounter a problem that every time I train the exact same network, it gives me different final error due to random initialization of the weights. Sometimes the differences are small and negligible, sometimes they are significant, depending on the data and architecture.

My problem arises when I want to tune some parameters like number of layers or neurons, because I don't know if the change in final error was caused by recent changes in network's architecture or it is simply effect of the aforementioned randomness.

My question is how to deal with this issue?

score 2 · Answer 1 · answered Jan 11 '20 at 13:21

I don't think you can.

Say a NN with 3 layers gives an accuracy of 95.3% and another NN with 4 layers gives an accuracy of 95.4%. Then there is no guarantee that the 4 layer NN is better than the 3 layered NN. Since with different initial values the 3 layer NN might perform better.

You could run multiple times and probabilistically say that this is better, but this is computational intense.

score 1 · Answer 2 · answered Jan 11 '20 at 23:58

There are two weight-initializing methods for neural networks: 1-Zero initializing 2-Random initializing

https://towardsdatascience.com/weight-initialization-techniques-in-neural-networks-26c649eb3b78

If you choose zero initalizing method in every train loop, you may get same results OR you can use transfer learning according to your problem, it allows to start same parameters. At last, as a worst and hardest choice, you can write your own weight arrays and feed your layers.

Problem you mentioned is one of the most interesting problems in the evaluation of performance of neural networks. You can use cross-validation method to verify your model's accuracy! ıt will give more reliable results!

score 1 · Answer 3 · answered Mar 21 '20 at 16:20

There are other sources that will lead to different results in addition to weight initialization. For example dropout layers. Make sure you specify the random seed.Also data reading using flow from directory,make sure you set shuffle to False or if you do not then set the random seed. If you use transfer learning make that part of your network non-trainable. Some networks have dropout in them and do not provide a way to set the random seed. IF you are using a GPU there are even more issues to contend with.

How to deal with random weights initialization in hyperparameters tuning?

3 Answers3