Highest Voted 'gradient' Questions - Artificial Intelligence Stack Exchange

10

votes

1 answer

What is the relationship between gradient accumulation and batch size?

I am currently training some models using gradient accumulation since the model batches do not fit in GPU memory. Since I am using gradient accumulation, I had to tweak the training configuration a bit. There are two parameters that I tweaked: the…

asked Jun 17 '20 at 15:58

JVGD

1,088
1
6
14

6

votes

1 answer

How is the gradient of the loss function in DQN derived?

In the original DQN paper, page 1, the loss function of the DQN is $$ L_{i}(\theta_{i}) = \mathbb{E}_{(s,a,r,s') \sim U(D)} [(r+\gamma \max_{a'} Q(s',a',\theta_{i}^{-}) - Q(s,a;\theta_{i}))^2] $$ whose gradient is presented (on page…

reinforcement-learning dqn deep-rl gradient-descent gradient

asked Sep 07 '19 at 14:18

Dimitris Monroe

171
8

5

votes

2 answers

Why is the derivative of this objective function 0 if the policy is deterministic?

In the Berkeley RL class CS294-112 Fa18 9/5/18, they mention the following gradient would be 0 if the policy is deterministic. $$ \nabla_{\theta} J(\theta)=E_{\tau \sim \pi_{\theta}(\tau)}\left[\left(\sum_{t=1}^{T} \nabla_{\theta} \log…

reinforcement-learning policy-gradients policies gradient calculus

asked Sep 06 '18 at 12:44

jonperl

153
7

5

votes

2 answers

Why is tf.abs non-differentiable in Tensorflow?

I understand why tf.abs is non-differentiable in principle (discontinuity at 0) but the same applies to tf.nn.relu yet, in case of this function gradient is simply set to 0 at 0. Why the same logic is not applied to tf.abs? Whenever I tried to use…

tensorflow backpropagation relu gradient

asked Feb 17 '21 at 17:29

zedsdead

53
3

5

votes

2 answers

Is the gradient at a layer independent of the activations of the previous layers?

Is the gradient at a layer (of a feed-forward neural network) independent of the activations of the previous layers? I read this in a paper titled Mean Field Residual Networks: On the Edge of Chaos (2017). I am not sure how far this is true, because…

neural-networks deep-learning backpropagation gradient-descent gradient

asked Oct 11 '19 at 05:27

Snehal Reddy

69
4

4

votes

1 answer

Why is it a problem if the outputs of an activation function are not zero-centered?

In this lecture, the professor says that one problem with the sigmoid function is that its outputs aren't zero-centered. Are the explanation provided by the professor regarding why this is bad is that the gradient of our loss w.r.t. the weights…

backpropagation activation-functions sigmoid gradient

asked Mar 23 '21 at 07:47

Daviiid

563
3
15

3

votes

0 answers

Why does training converges when the norm of gradient increases?

This is from deep learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville. When training converges well, I thought the gradient should be at local minima. But the book says it often does not arrive at the critical points. Could you…

deep-learning deep-neural-networks gradient

asked Mar 29 '23 at 10:15

tesio

185
4

3

votes

1 answer

Why is automatic differentiation still used, if today's computers can calculate symbolic derivatives quite fast?

Today's computers can calculate symbolic derivatives quite fast, why is automatic differentiation still used? For example, Mathematica can handle algebraic operations with arrays. Doesn't automatic differentiation cause significant overhead?…

multilayer-perceptrons gradient

asked Nov 22 '22 at 13:05

asd

33
2

2

votes

0 answers

How to prepare audio data for deep learning?

Audio data is typically an array with the waveform represented by values from -1 to 1. There are two issues with that: if all values are inverted, e.g. -1 becomes 1 and 1 becomes -1, the audio doesn't change. But if for example I need to find…

data-preprocessing gradient audio-processing spectral-analysis

asked Feb 07 '23 at 14:10

nikishev.

21
3

2

votes

0 answers

GAN : Why does a perfect discriminator mean no gradient for the generator?

In the training of a Generative Adversarial Networks (GAN) system, a perfect discriminator (D) is one which outputs 1 ("true image") for all images of the training dataset and 0 ("false image") for all images created by the generator (G). I've read…

training generative-adversarial-networks loss gradient wasserstein-gan

asked Feb 02 '23 at 08:45

Soltius

221
1
8

2

votes

2 answers

What does it mean by "gradient flow" in the context of neural networks?

Several research papers and textbooks (e.g. this) contain the phrase "gradient flow" in the context of neural networks. I am confused about whether it has any rigorous and formal way of understanding or not. What is the flow referring to here?

deep-learning terminology math definitions gradient

asked Mar 16 '22 at 07:24

hanugm

3,571
3
18
50

2

votes

2 answers

What specifically is the gradient of the log of the probability in policy gradient methods?

I am getting tripped up slightly by how specifically the gradient is calculated in policy gradient methods (just the intuitive understanding of it). This Math Stack Exchange post is close, but I'm still a little confused. In standard supervised…

reinforcement-learning objective-functions policy-gradients gradient

asked Dec 07 '21 at 17:55

user9317212

161
2
10

2

votes

1 answer

What does it mean by strong or sufficient gradient for training in this context?

It has been mentioned in the research paper titled Generative Adversarial Nets that generator need to maximize the function $\log D(G(z))$ instead of minimizing $\log(1 −D(G(z)))$ since the former provides sufficient gradient than latter. $$\min_G…

training terminology papers generative-adversarial-networks gradient

asked Aug 03 '21 at 02:15

hanugm

3,571
3
18
50

2

votes

1 answer

What is $ \nabla_{\theta_{k-1}} \theta_{k}$ in the context of MAML?

I am attempting to fully understand the explicit derivation and computation of the Hessian and how it is used in MAML. I came across this blog: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html. Specifically, could someone help to…

math notation meta-learning gradient model-agnostic-meta-learning

asked Jan 13 '21 at 16:36

Blake Camp

23
2

2

votes

2 answers

How can we compute the gradient of max pooling with overlapping regions?

While studying backpropagation in CNNs, I can't understand how can we compute the gradient of max pooling with overlapping regions. That's also a question from this quiz and can be also found on this book.

convolutional-neural-networks backpropagation gradient pooling max-pooling

asked Dec 15 '19 at 09:41

estamos

157
1
12

Questions tagged [gradient]