4

This is a topic I have been arguing about for some time now with my colleagues, maybe you could also voice your opinion about it.

Artificial neural networks use random weight initialization within a certain value range. These random parameters are derived from a pseudorandom number generator (Gaussian etc.) and they have been sufficient so far.

With a proper sample simple, pseudorandom numbers can be statistically tested that they are in fact not true random numbers. With a huge neural network like GPT-3 with roughly 175 billion trainable parameters, I guess that if you would use the same statistical testing on the initial weights of GPT-3 you would also get a clear result that these parameters are pseudorandom.

With a model of this size, could in theory at least the repeatable structures of initial weights caused by their pseudorandomness affect the model fitting procedure in a way that the completed model would be affected (generalization or performance-wise)? In other words, could the quality of randomness affect the fitting of huge neural networks?

nbro
  • 39,006
  • 12
  • 98
  • 176
Aki Koivu
  • 41
  • 3
  • 2
    Nice theory, but difficult to comment upon. You see the pseudo randomness will cause redundancy, this will result in NNs not leveraging its actual approximation power, dictated by its size. This will be a good thing if the task is simple and a bad thing if the task is complex (recall bias variance tradeoff). One can probably formalize this using VC dimensions. –  Feb 09 '21 at 21:37

0 Answers0