Questions tagged [weights-initialization]

For questions about the different techniques of initializing weights (or parameters) of machine learning models.

39 questions
16
votes
5 answers

Why are the initial weights of neural networks randomly initialised?

This might sound silly to someone who has plenty of experience with neural networks but it bothers me... Random initial weights might give you better results that would be somewhat closer to what a trained neural network should look like, but it…
7
votes
1 answer

How to solve the problem of too big activations when using genetic algorithms to train neural networks?

I am trying to create a fixed-topology MLP from scratch (with C#), which can solve some simple problems, such as the XOR problem and MNIST classification. The network will be trained purely with genetic algorithms instead of back-propagation. Here…
7
votes
1 answer

Is there a proper initialization technique for the weight matrices in multi-head attention?

Self-attention layers have 4 learnable tensors (in the vanilla formulation): Query matrix $W_Q$ Key matrix $W_K$ Value matrix $W_V$ Output matrix $W_O$ Nice illustration from https://jalammar.github.io/illustrated-transformer/ However, I do not…
7
votes
0 answers

Why is there a Uniform and Normal version of He / Xavier initialization in DL libraries?

Two of the most popular initialization schemes for neural network weights today are Xavier and He. Both methods propose random weight initialization with a variance dependent on the number of input and output units. Xavier proposes $$W \sim…
Tinu
  • 618
  • 1
  • 4
  • 12
6
votes
3 answers

Is random initialization of the weights the only choice to break the symmetry?

My knowledge Suppose you have a layer that is fully connected, and that each neuron performs an operation like a = g(w^T * x + b) were a is the output of the neuron, x the input, g our generic activation function, and finally w and b our…
gvgramazio
  • 696
  • 2
  • 7
  • 19
6
votes
1 answer

How are the kernels initialized in a convolutional neural network?

I am currently learning about CNNs. I am confused about how filters (aka kernels) are initialized. Suppose that we have a $3 \times 3$ kernel. How are the values of this filter initialized before training? Do you just use predefined image kernels?…
5
votes
0 answers

What is the justification for Kaiming He initialization?

I've been trying to understand where the formulas for Xavier and Kaiming He initialization come from. My understanding is that these initialization schemes come from a desire to keep the gradients stable during back-propagation (avoiding…
Jack M
  • 242
  • 1
  • 8
4
votes
1 answer

Do we know what the units of neural networks will do before we train them?

I was learning about back-propagation and, looking at the algorithm, there is no particular 'partiality' given to any unit. What I mean by partiality there is that you have no particular characteristic associated with any unit, and this results in…
4
votes
0 answers

Can the quality of randomness in neural network initialization affect model fitting?

This is a topic I have been arguing about for some time now with my colleagues, maybe you could also voice your opinion about it. Artificial neural networks use random weight initialization within a certain value range. These random parameters are…
3
votes
1 answer

Initial State of RNN

Can I initialize the initial state of my RNN to be non-zero? I have some initial condition of the sequence and I want to use this initial condition as the initial state.
wrek
  • 183
  • 4
3
votes
0 answers

How efficient is SCAWI weight initialization method?

I'm currently in the middle of a project (for my thesis) constructing a deep neural network. Since I'm still in the research part, I'm trying to find various ways and techniques to initialize weights. Obviously, every way will be evaluated and we…
3
votes
1 answer

How are newer weight initialization techniques better than zero or random initialization?

How do newer weight initialization techniques (He, Xavier, etc) improve results over zero or random initialization of weights in a neural network? Is there any mathematical evidence behind this?
3
votes
1 answer

How does the initialization of the value function and definition of the reward function affect the performance of the RL agent?

Is there any empirical/theoretical evidence on the effect of initial values of state-action and state values on the training of an RL agent (the values an RL agent assigns to visited states) via MC methods Policy Evaluation and GLIE Policy…
2
votes
0 answers

Is orthogonal initialization still useful when hidden layer sizes vary?

Pytorch's orthogonal initialization cites "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks ", Saxe, A. et al. (2013), which gives as reason for the usefulness of orthogonal initialization the fact that for a…
2
votes
1 answer

How to decode P bits that represent a random weight generator?

So I've been tasked by my neural network professor at university to replicate the following research: Intelligent Breast Cancer Diagnosis Using Hybrid GA-ANN. Each chromosome represents a possible net, more specifically, a possible MLP network.…
1
2 3