Highest Voted 'weights-initialization' Questions - Artificial Intelligence Stack Exchange

16

votes

5 answers

Why are the initial weights of neural networks randomly initialised?

This might sound silly to someone who has plenty of experience with neural networks but it bothers me... Random initial weights might give you better results that would be somewhat closer to what a trained neural network should look like, but it…

asked Oct 21 '17 at 06:52

Matas Vaitkevicius

271
5
12

7

votes

1 answer

How to solve the problem of too big activations when using genetic algorithms to train neural networks?

I am trying to create a fixed-topology MLP from scratch (with C#), which can solve some simple problems, such as the XOR problem and MNIST classification. The network will be trained purely with genetic algorithms instead of back-propagation. Here…

neural-networks genetic-algorithms data-preprocessing weights-initialization weight-normalization

asked Apr 06 '18 at 15:49

Joshua Jang

73
4

7

votes

1 answer

Is there a proper initialization technique for the weight matrices in multi-head attention?

Self-attention layers have 4 learnable tensors (in the vanilla formulation): Query matrix $W_Q$ Key matrix $W_K$ Value matrix $W_V$ Output matrix $W_O$ Nice illustration from https://jalammar.github.io/illustrated-transformer/ However, I do not…

transformer attention weights weights-initialization

asked Sep 01 '21 at 08:41

spiridon_the_sun_rotator

2,454
8
16

7

votes

0 answers

Why is there a Uniform and Normal version of He / Xavier initialization in DL libraries?

Two of the most popular initialization schemes for neural network weights today are Xavier and He. Both methods propose random weight initialization with a variance dependent on the number of input and output units. Xavier proposes $$W \sim…

neural-networks training weights weights-initialization

asked Dec 17 '20 at 23:15

Tinu

618
1
4
12

6

votes

3 answers

Is random initialization of the weights the only choice to break the symmetry?

My knowledge Suppose you have a layer that is fully connected, and that each neuron performs an operation like a = g(w^T * x + b) were a is the output of the neuron, x the input, g our generic activation function, and finally w and b our…

neural-networks training weights-initialization

asked Jun 18 '18 at 19:32

gvgramazio

696
2
7
19

6

votes

1 answer

How are the kernels initialized in a convolutional neural network?

I am currently learning about CNNs. I am confused about how filters (aka kernels) are initialized. Suppose that we have a $3 \times 3$ kernel. How are the values of this filter initialized before training? Do you just use predefined image kernels?…

deep-learning convolutional-neural-networks image-recognition filters weights-initialization

asked Jan 22 '18 at 15:56

Inkplay_

411
4
8

5

votes

0 answers

What is the justification for Kaiming He initialization?

I've been trying to understand where the formulas for Xavier and Kaiming He initialization come from. My understanding is that these initialization schemes come from a desire to keep the gradients stable during back-propagation (avoiding…

deep-learning weights-initialization

asked Oct 28 '20 at 18:26

Jack M

242
1
8

4

votes

1 answer

Do we know what the units of neural networks will do before we train them?

I was learning about back-propagation and, looking at the algorithm, there is no particular 'partiality' given to any unit. What I mean by partiality there is that you have no particular characteristic associated with any unit, and this results in…

neural-networks backpropagation artificial-neuron weights weights-initialization

asked Jul 05 '18 at 05:25

Htnamus

43
6

4

votes

0 answers

Can the quality of randomness in neural network initialization affect model fitting?

This is a topic I have been arguing about for some time now with my colleagues, maybe you could also voice your opinion about it. Artificial neural networks use random weight initialization within a certain value range. These random parameters are…

neural-networks weights-initialization randomness

asked Feb 09 '21 at 10:43

Aki Koivu

41
3

3

votes

1 answer

Initial State of RNN

Can I initialize the initial state of my RNN to be non-zero? I have some initial condition of the sequence and I want to use this initial condition as the initial state.

recurrent-neural-networks weights-initialization

asked Oct 23 '22 at 18:42

wrek

183
4

3

votes

0 answers

How efficient is SCAWI weight initialization method?

I'm currently in the middle of a project (for my thesis) constructing a deep neural network. Since I'm still in the research part, I'm trying to find various ways and techniques to initialize weights. Obviously, every way will be evaluated and we…

neural-networks reference-request weights weights-initialization

asked Oct 19 '20 at 12:25

ChrisP

41
4

3

votes

1 answer

How are newer weight initialization techniques better than zero or random initialization?

How do newer weight initialization techniques (He, Xavier, etc) improve results over zero or random initialization of weights in a neural network? Is there any mathematical evidence behind this?

neural-networks deep-learning weights weights-initialization

asked Feb 08 '20 at 10:25

jaeger6

308
1
7

3

votes

1 answer

How does the initialization of the value function and definition of the reward function affect the performance of the RL agent?

Is there any empirical/theoretical evidence on the effect of initial values of state-action and state values on the training of an RL agent (the values an RL agent assigns to visited states) via MC methods Policy Evaluation and GLIE Policy…

reinforcement-learning value-functions reward-design reward-functions weights-initialization

asked Jul 06 '19 at 20:47

user9947

2

votes

0 answers

Is orthogonal initialization still useful when hidden layer sizes vary?

Pytorch's orthogonal initialization cites "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks ", Saxe, A. et al. (2013), which gives as reason for the usefulness of orthogonal initialization the fact that for a…

deep-neural-networks computational-learning-theory weights-initialization linear-algebra

asked Jun 01 '23 at 11:12

Gabi

121
2

2

votes

1 answer

How to decode P bits that represent a random weight generator?

So I've been tasked by my neural network professor at university to replicate the following research: Intelligent Breast Cancer Diagnosis Using Hybrid GA-ANN. Each chromosome represents a possible net, more specifically, a possible MLP network.…

neural-networks papers genetic-algorithms multilayer-perceptrons weights-initialization

asked May 17 '22 at 18:11

JOSEPH CAROÈ

21
2

Questions tagged [weights-initialization]