For questions about the concept of weight (or parameter) of a machine learning model, such as a neural network or a linear regression model.
Questions tagged [weights]
84 questions
66
votes
12 answers
In a CNN, does each new filter have different weights for each input channel, or are the same weights of each filter used across input channels?
My understanding is that the convolutional layer of a convolutional neural network has four dimensions: input_channels, filter_height, filter_width, number_of_filters. Furthermore, it is my understanding that each new filter just gets convoluted…

Ryan Chase
- 793
- 1
- 6
- 6
16
votes
5 answers
Why are the initial weights of neural networks randomly initialised?
This might sound silly to someone who has plenty of experience with neural networks but it bothers me...
Random initial weights might give you better results that would be somewhat closer to what a trained neural network should look like, but it…

Matas Vaitkevicius
- 271
- 5
- 12
7
votes
1 answer
Is there a proper initialization technique for the weight matrices in multi-head attention?
Self-attention layers have 4 learnable tensors (in the vanilla formulation):
Query matrix $W_Q$
Key matrix $W_K$
Value matrix $W_V$
Output matrix $W_O$
Nice illustration from https://jalammar.github.io/illustrated-transformer/
However, I do not…

spiridon_the_sun_rotator
- 2,454
- 8
- 16
7
votes
0 answers
Why is there a Uniform and Normal version of He / Xavier initialization in DL libraries?
Two of the most popular initialization schemes for neural network weights today are Xavier and He. Both methods propose random weight initialization with a variance dependent on the number of input and output units. Xavier proposes
$$W \sim…

Tinu
- 618
- 1
- 4
- 12
6
votes
2 answers
What is the goal of weight initialization in neural networks?
This is a simple question. I know the weights in a neural network can be initialized in many different ways like: random uniform distribution, normal distribution, and Xavier initialization. But what is the weight initialization trying to…

S2673
- 560
- 4
- 16
6
votes
2 answers
Can neurons in MLP and filters in CNN be compared?
I know they are not the same in working, but an input layer sends the input to $n$ neurons with a set of weights, based on these weights and the activation layer, it produces an output that can be fed to the next layer.
Aren't the filters the same,…

Tibo Geysen
- 193
- 5
5
votes
2 answers
What do the neural network's weights represent conceptually?
I understand how neural networks work and have studied their theory well.
My question is: On the whole, is there a clear understanding of how mutation occurs within a neural network from the input layer to the output layer, for both supervised and…

user248884
- 151
- 3
5
votes
1 answer
Why did the developement of neural networks stop between 50s and 80s?
In a video lecture on the development of neural networks and the history of deep learning (you can start from minute 13), the lecturer (Yann LeCunn) said that the development of neural networks stopped until the 80s because people were using the…

Daviiid
- 563
- 3
- 15
4
votes
1 answer
Do we know what the units of neural networks will do before we train them?
I was learning about back-propagation and, looking at the algorithm, there is no particular 'partiality' given to any unit. What I mean by partiality there is that you have no particular characteristic associated with any unit, and this results in…

Htnamus
- 43
- 6
4
votes
1 answer
In TD(0) with linear function approximation, why is the gradient of $\hat v(S^{\prime}, \mathbf w)$ wrt parameters $\mathbf w$ not considered?
I am reading these slides. On page 38, the update for the parameters for the linear function approximation of TD(0) is given. I have a doubt regarding this.
The cost function (RMSE) is given on page 37.
My doubt is: why is the gradient of $\hat…

A Yoghes
- 43
- 4
4
votes
0 answers
Why does sigmoid saturation prevent signal flow through the neuron?
As per these slides on page 35:
Sigmoids saturate and kill gradients.
when the neuron's activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero.
the gradient and almost no signal will flow through the neuron…

EEAH
- 193
- 1
- 5
4
votes
0 answers
When is using weight regularization bad?
Regularization of weights (e.g. L1 or L2) keeps them small and standardized, which can help reduce data overfitting. From this article, regularization sounds favorable in many cases, but is it always encouraged? Are there scenarios in which it…

mark mark
- 753
- 4
- 23
4
votes
1 answer
How are the parameters of the Bernoulli distribution learned?
In the paper Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask, they learn a mask for the network by setting up the mask parameters as $M_i = Bern(\sigma(v_i))$. Where $M$ is the parameter mask ($f(x;\theta, M) = f(x;M \odot \theta$),…

mshlis
- 2,349
- 7
- 23
3
votes
1 answer
What is the significance of weights in a feedforward neural network?
In a feedforward neural network, the inputs are fed directly to the outputs via a series of weights.
What purpose do the weights serve, and how are they significant in this neural network?

kenorb
- 10,423
- 3
- 43
- 91
3
votes
0 answers
Are there neural networks with (hard) constraints on the weights?
I don't know too much about Deep Learning, so my question might be silly. However, I was wondering whether there are NN architectures with some hard constraints on the weights of some layers. For example, let $(W^k_{ij})_{ij}$ be the weights of the…

Onil90
- 173
- 5