For questions related to the concept of loss (or cost) function in the context of machine learning.
Questions tagged [objective-functions]
254 questions
20
votes
3 answers
How can we process the data from both the true distribution and the generator?
I'm struggling to understand the GAN loss function as provided in Understanding Generative Adversarial Networks (a blog post written by Daniel Seita).
In the standard cross-entropy loss, we have an output that has been run through a sigmoid function…

tryingtolearn
- 385
- 1
- 2
- 10
10
votes
1 answer
Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorch
I'm training an auto-encoder network with Adam optimizer (with amsgrad=True) and MSE loss for Single channel Audio Source Separation task. Whenever I decay the learning rate by a factor, the network loss jumps abruptly and then decreases until the…

imflash217
- 499
- 4
- 14
8
votes
1 answer
How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?
I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard.
However, I have problems understanding how one should come up with this kind of…

Yves Boutellier
- 183
- 6
8
votes
1 answer
What's the advantage of log_softmax over softmax?
Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem.
However, while I am learning the pytorch mnist tutorial,…

user1024
- 181
- 2
7
votes
4 answers
Can the mean squared error be negative?
I'm new to machine learning. I was watching a Prof. Andrew Ng's video about gradient descent from the machine learning online course. It said that we want our cost function (in this case, the mean squared error) to have the minimum value, but that…

Borna Ghahnoosh
- 171
- 1
- 2
7
votes
1 answer
What is an objective function?
Local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function.
My question is what is the objective function?

Abbas Ali
- 566
- 3
- 10
- 17
7
votes
2 answers
How should we interpret this figure that relates the perceptron criterion and the hinge loss?
I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following:
The perceptron criterion is a shifted version of the hinge-loss used in…

The Pointer
- 527
- 3
- 17
7
votes
1 answer
What loss function to use when labels are probabilities?
What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, \dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.
It…

Thomas Johnson
- 173
- 3
6
votes
2 answers
What is the difference between a loss function and reward/penalty in Deep Reinforcement Learning?
In Deep Reinforcement Learning (DRL) I am having difficulties in understanding the difference between a Loss function, a reward/penalty and the integration of both in DRL.
Loss function: Given an output of the model and the ground truth,…

Theo Deep
- 175
- 1
- 5
6
votes
1 answer
What is the cost function of a transformer?
The paper Attention Is All You Need describes the transformer architecture that has an encoder and a decoder.
However, I wasn't clear on what the cost function to minimize is for such an architecture.
Consider a translation task, for example, where…

user3667125
- 1,500
- 5
- 13
6
votes
2 answers
Why does TensorFlow docs discourage using softmax as activation for the last layer?
The beginner colab example for tensorflow states:
Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…

galah92
- 163
- 5
6
votes
1 answer
Why is the evidence equal to the KL divergence plus the loss?
Why is the equation $$\log p_{\theta}(x^1,...,x^N)=D_{KL}(q_{\theta}(z|x^i)||p_{\phi}(z|x^i))+\mathbb{L}(\phi,\theta;x^i)$$ true, where $x^i$ are data points and $z$ are latent variables?
I was reading the original variation autoencoder paper and I…

user8714896
- 717
- 1
- 4
- 21
6
votes
1 answer
What is the formula used to calculate the loss in the FaceNet model?
The FaceNet model returns the loss of the predictions and ground-truth classes. How is this loss calculated?

TheReal__Mike
- 121
- 8
5
votes
1 answer
What is the difference between the triplet loss and the contrastive loss?
What is the difference between the triplet loss and the contrastive loss?
They look same to me. I don't understand the nuances between the two. I have the following queries:
When to use what?
What are the use cases and advantages or disadvantages…

Exploring
- 223
- 6
- 16
5
votes
2 answers
How to check whether my loss function is convex or not?
Loss functions are useful in calculating loss and then we can update the weights of a neural network. The loss function is thus useful in training neural networks.
Consider the following excerpt from this answer
In principle, differentiability is…

hanugm
- 3,571
- 3
- 18
- 50