3

When trying to map artificial neuronal models to biological facts it was not possible to find an answer regarding the biological justification of randomly initializing the weights.

Perhaps this is not yet known from our current understanding of biological neurons?

Leo Gallucci
  • 216
  • 2
  • 8

2 Answers2

4

In short

I mentioned in another post, how the Artificial Neural Network (ANN) weights are a relatively crude abstraction of connections between neurons in the brain. Similarly, the random weight initialization step in ANNs is a simple procedure that abstracts the complexity of central nervous system development and synaptogenesis.

A bit more detail (with the most relevant parts italicized below)

The neocortex (one of its columns, more specifically) is a region of the brain that somewhat resembles an ANN. It has a laminar structure with layers that receive and send axons from other brain regions. Those layers can be viewed as "input" and "output" layers of an ANN (axons "send" signals, dendrites "receive"). Other layers are intermediate-processing layers and can be viewed as the ANN "hidden" layers.

When building an ANN, the programmer can set the number of layers and the number of units in each layer. In the neocortex, the number of layers and layer cell counts are determined mostly by genes (however, see: Human echolocation for an example of post-birth brain plasticity). Chemical cues guide the positions of the cell bodies and create the laminar structure. They also seem to guide long term axonal connections between distant brain regions. The cells then sprout dendrites in certain characteristic "tree-like" patterns (see: NeuroMorpho.org for examples). The dendrites will then form synapses with axons or other cell bodies they encounter along the way, generally based on the encountered cell type.

This last phase is probably the most analogous to the idea of random weight initialization in ANNs. Based on where the cell is positioned and its type, the encountered other neurons will be somewhat random and so will the connections to them. These connections are probably not going to be very strong initially but will have room to get stronger during learning (probably analogous to initial random weights between 0 and ~0.1, with 1 being the strongest possible connection). Furthermore, most cells are either inhibitory or excitatory (analogous to negative and positive weights).

Keep in mind this randomization process has a heavy spatial component in real brains. The neurons are small and so they will make these connections to nearby neurons that are 10-200 microns away. The long-distance connections between brain regions are mostly "programmed-in" via genes. In most ANNs, there is generally no distance-based aspect to the initialization of connection weights (although convolutional ANNs implicitly perform something like distance-based wiring by using the sliding window).

There is also the synaptic pruning phenomenon, which might be analogous to creating many low weight connections in an ANN initially (birth), training it for some number of epochs (adolescence), and then removing most low-weight connections (consolidation in adulthood).

Justas
  • 156
  • 3
  • Do you by chance also have some hints on https://ai.stackexchange.com/questions/10416/is-the-thatcher-face-illusion-only-limited-to-face-recognition I haven't received any answer yet. – Leo Gallucci Feb 14 '19 at 20:06
0

I am not an DL expert but these are my short thoughts on it:

I think this is because it is believed (from an information theoretic point of view) to be the good way to avoid that the network falls into some wired state from beginning on. Remember: DNNs are nonlinear approximators for continuous functions. So they have some storage capacity to learn an amount of n function to map from input to output. When you look on topic like data leakage you will see that NNs quickly try to cheat you if they can :D. The optimization applied during training will heavily be affected by the init state. So starting with an random initialization at least avoids that your neurons do all the same at the beginning etc.

Biological reasoning: From the viewpoint of a neurobiologist I can recommend you to read Hebbian rule and how neural systems work (eg. google how neurons find targets) in general and then to compare it to what is known about how dendrite cells in the cerebrum develop their interconnections in the first 3 years after birth. In summary there are behavioral patterns in nature which could look similar, inspiring and even reasonable. But, I would say the reason why this random init. is recommend is backed by mathematical and information theoretical assumptions rather then pure biological arguments.

PlagTag
  • 136
  • 5