Questions tagged [derivative]
15 questions
4
votes
1 answer
Why is my derivation of the back-propagation equations inconsistent with Andrew Ng's slides from Coursera?
I am using the cross-entropy cost function to calculate its derivatives using different variables $Z, W$ and $b$ at different instances. Please refer image below for calculation.
As per my knowledge, my derivation is correct for $dZ, dW, db$ and…

learner
- 151
- 5
2
votes
1 answer
How is the max function differentiable wrt multiple arguments?
I recently came across an answer on StackOverflow that mentioned the max function being differentiable with respect to its values.
From my current understanding of mathematics, I'm struggling to comprehend how this is possible.
Could someone help…

Peyman
- 534
- 3
- 10
2
votes
2 answers
Why does critical points and stationary points are used interchangeably?
Consider the following paragraph from Numerical Computation of the deep learning book.
When $f'(x) = 0$, the derivative provides no information about which
direction to move. Points where $f'(x)$ = 0 are known as critical
points, or stationary…

hanugm
- 3,571
- 3
- 18
- 50
2
votes
0 answers
What is the dimensionality of these derivatives in the paper "Active Learning for Reward Estimation in Inverse Reinforcement Learning"?
I'm trying to implement in code part of the following paper: Active Learning for Reward Estimation in Inverse Reinforcement Learning.
I'm specifically referring to section 2.3 of the paper.
Let's define $\mathcal{X}$ as the set of states, and…

ИванКарамазов
- 141
- 5
1
vote
2 answers
What does it mean "having Lipschitz continuous derivatives"?
We can enforce some constraints on functions used in deep learning in order to guarantee optimizations. You can find it in Numerical Computation of the deep learning book.
In the context of deep learning, we sometimes gain some guarantees…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
0 answers
BlackOut - ICLR 2016: need help understanding the cost function derivative
In the ICLR 2016 paper BlackOut: Speeding up Recurrent Neural Network Language Models with very Large Vocabularies, on page 3, for eq. 4:
$$ J_{ml}^s(\theta) = log \ p_{\theta}(w_i | s) $$
They have shown the gradient computation in the subsequent…

anurag
- 151
- 1
- 7
0
votes
0 answers
How to select pseudo label samples that minimize validation loss?
I have a problem about meta pseudo labeling, I want to select the most significant pseudo-labels that minimize validation loss. Let's say i initialize a set of pseudo label denoted $Y_{pseudo}$, then i perform parameter update by gradient…

Việt Nguyễn
- 1
- 1
0
votes
0 answers
Why Is There The Term 1/m In Backpropagation
In backpropagation the gradients are used to update the weights using the formula
$$w = w - \alpha \frac{dL}{dw}$$
and the loss gradient w.r.t. weights is
$$\frac{dL}{dw} = \frac{dL}{dz} \frac{dz}{dw} = (\frac{dL}{da} \frac{da}{dz} \frac{1}{m})…

rkuang25
- 21
- 4
0
votes
1 answer
What is the correct partial derivative of $Y^c$ with respect to $A_{ij}^{kc}$?
I have a question about the Grad-CAM++ paper. I do not understand how the following equation (10) for the alphas is obtained:
$$
\alpha_{ij}^{kc} =
\frac{\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2}}
{2\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2}
…

mlerma54
- 141
- 5
0
votes
2 answers
How the vector-space isomorphism between $\mathbb{R}^{m \times n}$ and $\mathbb{R}^{mn}$ guarantees reshaping matrices to vectors?
Consider the following paragraph from section 5.4 Gradients fo Matrices of the chapter Vector Calculus from the textbook titled Mathematics for Machine Learning by Marc Peter Deisenroth et al.
Since matrices represent linear mappings, we can…

hanugm
- 3,571
- 3
- 18
- 50
0
votes
1 answer
What is the rigorous and formal definition for the direction pointed by a gradient?
Consider the following definition of derivative from the chapter named Vector Calculus from the test book titled Mathematics for Machine Learning by Marc Peter Deisenroth et al.
Definition 5.2 (Derivative). More formally, for $h>0$ the…

hanugm
- 3,571
- 3
- 18
- 50
0
votes
1 answer
How to understand slope of a (non-convex) function at a point in domain?
Consider the following paragraph from Numerical Computation of deep learning book that says derivative as a slope of the function curve at a point
Suppose we have a function $y= f(x)$, where both $x$ and $y$ are real
numbers. The derivative of this…

hanugm
- 3,571
- 3
- 18
- 50
0
votes
2 answers
Reason for relaxing limit in derivative in this context?
Consider the following paragraph from NUMERICAL COMPUTATION of the deep learning book..
Suppose we have a function $y = f(x)$, where both $x$ and $y$ are real
numbers. The derivative of this function is denoted as $f'(x)$ or as
$\dfrac{dy}{dx}$.…

hanugm
- 3,571
- 3
- 18
- 50
0
votes
0 answers
Derivation of regularized cost function w.r.t activation and bias
In regularzied cost function a L2 regularization cost has been added.
Here we have already calculated cross entropy cost w.r.t $A, W$.
As mentioned in the regularization notebook (see below) in order to do derivation of regularized $J$ (cost…

learner
- 151
- 5
0
votes
1 answer
Backpropagation: Chain Rule to the Third Last Layer
I'm trying to solve dLoss/dW1. The network is as in picture below with identity activation at all neurons:
Solving dLoss/dW7 is simple as there's only 1 way to output:
$Delta = Out-Y$
$Loss = abs(Delta)$
The case when Delta>=0, partial derivative…

Dee
- 1,283
- 1
- 11
- 35