Highest Voted 'derivative' Questions - Artificial Intelligence Stack Exchange

4

votes

1 answer

Why is my derivation of the back-propagation equations inconsistent with Andrew Ng's slides from Coursera?

I am using the cross-entropy cost function to calculate its derivatives using different variables $Z, W$ and $b$ at different instances. Please refer image below for calculation. As per my knowledge, my derivation is correct for $dZ, dW, db$ and…

asked Mar 18 '20 at 21:17

learner

151
5

2

votes

1 answer

How is the max function differentiable wrt multiple arguments?

I recently came across an answer on StackOverflow that mentioned the max function being differentiable with respect to its values. From my current understanding of mathematics, I'm struggling to comprehend how this is possible. Could someone help…

neural-networks deep-learning backpropagation derivative max-pooling

asked May 25 '23 at 05:19

Peyman

534
3
10

2

votes

2 answers

Why does critical points and stationary points are used interchangeably?

Consider the following paragraph from Numerical Computation of the deep learning book. When $f'(x) = 0$, the derivative provides no information about which direction to move. Points where $f'(x)$ = 0 are known as critical points, or stationary…

terminology books derivative

asked Aug 20 '21 at 23:37

hanugm

3,571
3
18
50

2

votes

0 answers

What is the dimensionality of these derivatives in the paper "Active Learning for Reward Estimation in Inverse Reinforcement Learning"?

I'm trying to implement in code part of the following paper: Active Learning for Reward Estimation in Inverse Reinforcement Learning. I'm specifically referring to section 2.3 of the paper. Let's define $\mathcal{X}$ as the set of states, and…

reinforcement-learning papers rewards inverse-rl derivative

asked Jan 26 '21 at 10:13

ИванКарамазов

141
5

1

vote

2 answers

What does it mean "having Lipschitz continuous derivatives"?

We can enforce some constraints on functions used in deep learning in order to guarantee optimizations. You can find it in Numerical Computation of the deep learning book. In the context of deep learning, we sometimes gain some guarantees…

deep-learning math derivative

asked Aug 22 '21 at 06:38

hanugm

3,571
3
18
50

1

vote

0 answers

BlackOut - ICLR 2016: need help understanding the cost function derivative

In the ICLR 2016 paper BlackOut: Speeding up Recurrent Neural Network Language Models with very Large Vocabularies, on page 3, for eq. 4: $$ J_{ml}^s(\theta) = log \ p_{\theta}(w_i | s) $$ They have shown the gradient computation in the subsequent…

papers objective-functions calculus derivative

asked Feb 12 '21 at 16:21

anurag

151
1
7

0

votes

0 answers

How to select pseudo label samples that minimize validation loss?

I have a problem about meta pseudo labeling, I want to select the most significant pseudo-labels that minimize validation loss. Let's say i initialize a set of pseudo label denoted $Y_{pseudo}$, then i perform parameter update by gradient…

gradient meta-learning derivative active-learning validation-loss

asked Oct 28 '22 at 14:13

Việt Nguyễn

1
1

0

votes

0 answers

Why Is There The Term 1/m In Backpropagation

In backpropagation the gradients are used to update the weights using the formula $$w = w - \alpha \frac{dL}{dw}$$ and the loss gradient w.r.t. weights is $$\frac{dL}{dw} = \frac{dL}{dz} \frac{dz}{dw} = (\frac{dL}{da} \frac{da}{dz} \frac{1}{m})…

neural-networks tensorflow backpropagation math derivative

asked Jul 25 '22 at 02:19

rkuang25

21
4

0

votes

1 answer

What is the correct partial derivative of $Y^c$ with respect to $A_{ij}^{kc}$?

I have a question about the Grad-CAM++ paper. I do not understand how the following equation (10) for the alphas is obtained: $$ \alpha_{ij}^{kc} = \frac{\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2}} {2\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2} …

deep-learning papers math derivative grad-cam++

asked May 09 '22 at 17:35

mlerma54

141
5

0

votes

2 answers

How the vector-space isomorphism between $\mathbb{R}^{m \times n}$ and $\mathbb{R}^{mn}$ guarantees reshaping matrices to vectors?

Consider the following paragraph from section 5.4 Gradients fo Matrices of the chapter Vector Calculus from the textbook titled Mathematics for Machine Learning by Marc Peter Deisenroth et al. Since matrices represent linear mappings, we can…

machine-learning calculus derivative vector-space

asked Nov 30 '21 at 22:47

hanugm

3,571
3
18
50

0

votes

1 answer

What is the rigorous and formal definition for the direction pointed by a gradient?

Consider the following definition of derivative from the chapter named Vector Calculus from the test book titled Mathematics for Machine Learning by Marc Peter Deisenroth et al. Definition 5.2 (Derivative). More formally, for $h>0$ the…

definitions gradient calculus derivative

asked Nov 06 '21 at 23:39

hanugm

3,571
3
18
50

0

votes

1 answer

How to understand slope of a (non-convex) function at a point in domain?

Consider the following paragraph from Numerical Computation of deep learning book that says derivative as a slope of the function curve at a point Suppose we have a function $y= f(x)$, where both $x$ and $y$ are real numbers. The derivative of this…

math derivative

asked Aug 24 '21 at 01:21

hanugm

3,571
3
18
50

0

votes

2 answers

Reason for relaxing limit in derivative in this context?

Consider the following paragraph from NUMERICAL COMPUTATION of the deep learning book.. Suppose we have a function $y = f(x)$, where both $x$ and $y$ are real numbers. The derivative of this function is denoted as $f'(x)$ or as $\dfrac{dy}{dx}$.…

math optimization gradient-descent derivative

asked Aug 17 '21 at 08:05

hanugm

3,571
3
18
50

0

votes

0 answers

Derivation of regularized cost function w.r.t activation and bias

In regularzied cost function a L2 regularization cost has been added. Here we have already calculated cross entropy cost w.r.t $A, W$. As mentioned in the regularization notebook (see below) in order to do derivation of regularized $J$ (cost…

backpropagation math regularization derivative

asked Mar 21 '20 at 17:40

learner

151
5

0

votes

1 answer

Backpropagation: Chain Rule to the Third Last Layer

I'm trying to solve dLoss/dW1. The network is as in picture below with identity activation at all neurons: Solving dLoss/dW7 is simple as there's only 1 way to output: $Delta = Out-Y$ $Loss = abs(Delta)$ The case when Delta>=0, partial derivative…

math backpropagation gradient-descent derivative

asked Oct 09 '19 at 06:53

Dee

1,283
1
11
35

Questions tagged [derivative]