For questions related to calculus (developed, among others, by Newton and Leibniz), in the context of AI (and, in particular, machine learning).
Questions tagged [calculus]
25 questions
5
votes
2 answers
Why is the derivative of this objective function 0 if the policy is deterministic?
In the Berkeley RL class CS294-112 Fa18 9/5/18, they mention the following gradient would be 0 if the policy is deterministic.
$$
\nabla_{\theta} J(\theta)=E_{\tau \sim \pi_{\theta}(\tau)}\left[\left(\sum_{t=1}^{T} \nabla_{\theta} \log…

jonperl
- 153
- 7
5
votes
1 answer
Why is the change in cost wrt bias in neural network equal to error in the neuron?
While reading the book on neural networks by Michael Nielson, I had a problem understanding equation (BP3), which is
$$
\frac{\partial C}{\partial b_{j}^{l}}=\delta_{j}^{l} \tag{BP3}\label{BP3},
$$
which can be translated to plain English as…

Madhusoodan P
- 151
- 1
- 4
5
votes
2 answers
Are calculus and differential geometry required for building neural networks?
I've been studying geometry and linear algebra for months with the goal to build neural networks. But now I'm reading that perceptrons require fitting curves, and curves are not expressed as linear functions. So, I might need to study differential…

user456280
- 171
- 5
5
votes
2 answers
Which linear algebra book should I read to understand vectorized operations?
I am reading Goodfellow's book about neural networks, but I am stuck in the mathematical calculus of the back-propagation algorithm. I understood the principle, and some Youtube videos explaining this algorithm shown step-by-step, but now I would…

lolveley
- 151
- 3
3
votes
1 answer
Are my computations of the forward and backward pass of a neural network with one input, hidden and output neurons correct?
I have computed the forward and backward passes of the following simple neural network, with one input, hidden, and output neurons.
Here are my computations of the forward pass.
\begin{align}
net_1 &= xw_{1}+b \\
h &= \sigma (net_1) \\
net_2 &=…

Eka
- 1,036
- 8
- 23
3
votes
1 answer
What is the partial derivative $\frac{\partial y}{\partial x_1}$ in this neural network?
The answer is supposed to be -6, but I don't know how to get that.
Also, in a NN, is that 2nd hidden layer possible, where the neurons are not dependent on all the neurons of the previous layer?

duanebobby
- 33
- 3
3
votes
2 answers
What is a bad local minimum in machine learning?
What is "bad local minima"?
The following papers all mention this expression.
Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit
limination of All Bad Local Minima in Deep Learning
Adding One Neuron Can…

Umang Gupta
- 200
- 11
2
votes
0 answers
Best calculus books for Deep Learning
Recommend some calculus books for Deep Learning and neural networks. I know what is integration, differentiation, derivates, limits on a based level. I would like to understand on deep level the calculus behind Deep Learning and neural networks.

Dan Il
- 21
- 1
2
votes
1 answer
How is the log-derivative trick of a trajectory derived?
I am looking at this formula which breaks down the gradient of $P(\tau |\theta)$ the first part is clear as is the derivative of $\log(x)$, but I do not see how the first formula is rearranged into the second.

Jacob B
- 227
- 2
- 5
2
votes
0 answers
Is there any wrong in my focal loss derivation?
Assume $\mathbf{X} \in R^{N, C}$ is the input of the softmax $\mathbf{P} \in R^{N, C}$, where $N$ is number of examples and $C$ is number of classes:
$$\mathbf{p}_i = \left[ \frac{e^{x_{ik}}}{\sum_{j=1}^C e^{x_{ij}}}\right]_{k=1,2,...C} \in R^{C}…

Giang Tran
- 121
- 1
2
votes
0 answers
Is Gradient Descent algorithm a part of Calculus of Variations?
As in https://en.wikipedia.org/wiki/Calculus_of_variations
The calculus of variations is a field of mathematical analysis that
uses variations, which are small changes in functions and functionals,
to find maxima and minima of functionals
The…

Dee
- 1,283
- 1
- 11
- 35
1
vote
1 answer
How do policy gradients work?
If I understand it correctly from the following equation
$$U(\theta)=\mathbb{E}_{\tau \sim P(\tau;\theta)}\left [ \sum_{t=0}^{H-1}R(s_t,u_t);\pi_{\theta} \right ]=\sum_{\tau}P(\tau;\theta)R(\tau)$$
from this paper, the utility of a policy…

User
- 165
- 4
1
vote
1 answer
What all does the gradient tells us other than the direction to move parameters?
Gradients are used in optimization algorithms.
I know that a gradient gives us information about the direction in which one needs to update the weights of a neural network. We need to travel in the opposite direction of the gradient to get optimal…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
0 answers
BlackOut - ICLR 2016: need help understanding the cost function derivative
In the ICLR 2016 paper BlackOut: Speeding up Recurrent Neural Network Language Models with very Large Vocabularies, on page 3, for eq. 4:
$$ J_{ml}^s(\theta) = log \ p_{\theta}(w_i | s) $$
They have shown the gradient computation in the subsequent…

anurag
- 151
- 1
- 7
1
vote
0 answers
For the generalised delta rule in back-propogation, do you subtract the target from the obtained output, or vice versa?
When I look up the generalised delta rule equation for back-propogation, I am seeing two conflicting equations.
For example, here (slide 20), given $o$ (the output, defined in slide 18), $z$ (the activated output) and a target $t$, defined in slide…

Slowat_Kela
- 287
- 2
- 9