Questions tagged [objective-functions]

For questions related to the concept of loss (or cost) function in the context of machine learning.

See e.g. https://en.wikipedia.org/wiki/Loss_function.

254 questions
20
votes
3 answers

How can we process the data from both the true distribution and the generator?

I'm struggling to understand the GAN loss function as provided in Understanding Generative Adversarial Networks (a blog post written by Daniel Seita). In the standard cross-entropy loss, we have an output that has been run through a sigmoid function…
10
votes
1 answer

Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorch

I'm training an auto-encoder network with Adam optimizer (with amsgrad=True) and MSE loss for Single channel Audio Source Separation task. Whenever I decay the learning rate by a factor, the network loss jumps abruptly and then decreases until the…
8
votes
1 answer

How is the DQN loss derived from (or theoretically motivated by) the Bellman equation, and how is it related to the Q-learning update?

I'm doing a project on Reinforcement Learning. I programmed an agent that uses DDQN. There are a lot of tutorials on that, so the code implementation was not that hard. However, I have problems understanding how one should come up with this kind of…
8
votes
1 answer

What's the advantage of log_softmax over softmax?

Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem. However, while I am learning the pytorch mnist tutorial,…
user1024
  • 181
  • 2
7
votes
4 answers

Can the mean squared error be negative?

I'm new to machine learning. I was watching a Prof. Andrew Ng's video about gradient descent from the machine learning online course. It said that we want our cost function (in this case, the mean squared error) to have the minimum value, but that…
7
votes
1 answer

What is an objective function?

Local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function. My question is what is the objective function?
7
votes
2 answers

How should we interpret this figure that relates the perceptron criterion and the hinge loss?

I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following: The perceptron criterion is a shifted version of the hinge-loss used in…
7
votes
1 answer

What loss function to use when labels are probabilities?

What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, \dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$. It…
6
votes
2 answers

What is the difference between a loss function and reward/penalty in Deep Reinforcement Learning?

In Deep Reinforcement Learning (DRL) I am having difficulties in understanding the difference between a Loss function, a reward/penalty and the integration of both in DRL. Loss function: Given an output of the model and the ground truth,…
6
votes
1 answer

What is the cost function of a transformer?

The paper Attention Is All You Need describes the transformer architecture that has an encoder and a decoder. However, I wasn't clear on what the cost function to minimize is for such an architecture. Consider a translation task, for example, where…
6
votes
2 answers

Why does TensorFlow docs discourage using softmax as activation for the last layer?

The beginner colab example for tensorflow states: Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…
galah92
  • 163
  • 5
6
votes
1 answer

Why is the evidence equal to the KL divergence plus the loss?

Why is the equation $$\log p_{\theta}(x^1,...,x^N)=D_{KL}(q_{\theta}(z|x^i)||p_{\phi}(z|x^i))+\mathbb{L}(\phi,\theta;x^i)$$ true, where $x^i$ are data points and $z$ are latent variables? I was reading the original variation autoencoder paper and I…
6
votes
1 answer

What is the formula used to calculate the loss in the FaceNet model?

The FaceNet model returns the loss of the predictions and ground-truth classes. How is this loss calculated?
5
votes
1 answer

What is the difference between the triplet loss and the contrastive loss?

What is the difference between the triplet loss and the contrastive loss? They look same to me. I don't understand the nuances between the two. I have the following queries: When to use what? What are the use cases and advantages or disadvantages…
5
votes
2 answers

How to check whether my loss function is convex or not?

Loss functions are useful in calculating loss and then we can update the weights of a neural network. The loss function is thus useful in training neural networks. Consider the following excerpt from this answer In principle, differentiability is…
hanugm
  • 3,571
  • 3
  • 18
  • 50
1
2 3
16 17