Questions tagged [derivative]

15 questions
4
votes
1 answer

Why is my derivation of the back-propagation equations inconsistent with Andrew Ng's slides from Coursera?

I am using the cross-entropy cost function to calculate its derivatives using different variables $Z, W$ and $b$ at different instances. Please refer image below for calculation. As per my knowledge, my derivation is correct for $dZ, dW, db$ and…
learner
  • 151
  • 5
2
votes
1 answer

How is the max function differentiable wrt multiple arguments?

I recently came across an answer on StackOverflow that mentioned the max function being differentiable with respect to its values. From my current understanding of mathematics, I'm struggling to comprehend how this is possible. Could someone help…
2
votes
2 answers

Why does critical points and stationary points are used interchangeably?

Consider the following paragraph from Numerical Computation of the deep learning book. When $f'(x) = 0$, the derivative provides no information about which direction to move. Points where $f'(x)$ = 0 are known as critical points, or stationary…
hanugm
  • 3,571
  • 3
  • 18
  • 50
2
votes
0 answers

What is the dimensionality of these derivatives in the paper "Active Learning for Reward Estimation in Inverse Reinforcement Learning"?

I'm trying to implement in code part of the following paper: Active Learning for Reward Estimation in Inverse Reinforcement Learning. I'm specifically referring to section 2.3 of the paper. Let's define $\mathcal{X}$ as the set of states, and…
1
vote
2 answers

What does it mean "having Lipschitz continuous derivatives"?

We can enforce some constraints on functions used in deep learning in order to guarantee optimizations. You can find it in Numerical Computation of the deep learning book. In the context of deep learning, we sometimes gain some guarantees…
hanugm
  • 3,571
  • 3
  • 18
  • 50
1
vote
0 answers

BlackOut - ICLR 2016: need help understanding the cost function derivative

In the ICLR 2016 paper BlackOut: Speeding up Recurrent Neural Network Language Models with very Large Vocabularies, on page 3, for eq. 4: $$ J_{ml}^s(\theta) = log \ p_{\theta}(w_i | s) $$ They have shown the gradient computation in the subsequent…
anurag
  • 151
  • 1
  • 7
0
votes
0 answers

How to select pseudo label samples that minimize validation loss?

I have a problem about meta pseudo labeling, I want to select the most significant pseudo-labels that minimize validation loss. Let's say i initialize a set of pseudo label denoted $Y_{pseudo}$, then i perform parameter update by gradient…
0
votes
0 answers

Why Is There The Term 1/m In Backpropagation

In backpropagation the gradients are used to update the weights using the formula $$w = w - \alpha \frac{dL}{dw}$$ and the loss gradient w.r.t. weights is $$\frac{dL}{dw} = \frac{dL}{dz} \frac{dz}{dw} = (\frac{dL}{da} \frac{da}{dz} \frac{1}{m})…
0
votes
1 answer

What is the correct partial derivative of $Y^c$ with respect to $A_{ij}^{kc}$?

I have a question about the Grad-CAM++ paper. I do not understand how the following equation (10) for the alphas is obtained: $$ \alpha_{ij}^{kc} = \frac{\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2}} {2\frac{\partial^2 Y^c}{(\partial A_{ij}^k)^2} …
mlerma54
  • 141
  • 5
0
votes
2 answers

How the vector-space isomorphism between $\mathbb{R}^{m \times n}$ and $\mathbb{R}^{mn}$ guarantees reshaping matrices to vectors?

Consider the following paragraph from section 5.4 Gradients fo Matrices of the chapter Vector Calculus from the textbook titled Mathematics for Machine Learning by Marc Peter Deisenroth et al. Since matrices represent linear mappings, we can…
hanugm
  • 3,571
  • 3
  • 18
  • 50
0
votes
1 answer

What is the rigorous and formal definition for the direction pointed by a gradient?

Consider the following definition of derivative from the chapter named Vector Calculus from the test book titled Mathematics for Machine Learning by Marc Peter Deisenroth et al. Definition 5.2 (Derivative). More formally, for $h>0$ the…
hanugm
  • 3,571
  • 3
  • 18
  • 50
0
votes
1 answer

How to understand slope of a (non-convex) function at a point in domain?

Consider the following paragraph from Numerical Computation of deep learning book that says derivative as a slope of the function curve at a point Suppose we have a function $y= f(x)$, where both $x$ and $y$ are real numbers. The derivative of this…
hanugm
  • 3,571
  • 3
  • 18
  • 50
0
votes
2 answers

Reason for relaxing limit in derivative in this context?

Consider the following paragraph from NUMERICAL COMPUTATION of the deep learning book.. Suppose we have a function $y = f(x)$, where both $x$ and $y$ are real numbers. The derivative of this function is denoted as $f'(x)$ or as $\dfrac{dy}{dx}$.…
hanugm
  • 3,571
  • 3
  • 18
  • 50
0
votes
0 answers

Derivation of regularized cost function w.r.t activation and bias

In regularzied cost function a L2 regularization cost has been added. Here we have already calculated cross entropy cost w.r.t $A, W$. As mentioned in the regularization notebook (see below) in order to do derivation of regularized $J$ (cost…
learner
  • 151
  • 5
0
votes
1 answer

Backpropagation: Chain Rule to the Third Last Layer

I'm trying to solve dLoss/dW1. The network is as in picture below with identity activation at all neurons: Solving dLoss/dW7 is simple as there's only 1 way to output: $Delta = Out-Y$ $Loss = abs(Delta)$ The case when Delta>=0, partial derivative…
Dee
  • 1,283
  • 1
  • 11
  • 35