Questions tagged [loss]

For questions related to the concept of loss (or cost) in machine learning or other AI sub-fields.

79 questions
10
votes
3 answers

Should I choose a model with the smallest loss or highest accuracy?

I have two Machine Learning models (I use LSTM) that have a different result on the validation set (~100 samples data): Model A: Accuracy: ~91%, Loss: ~0.01 Model B: Accuracy: ~83%, Loss: ~0.003 The size and the speed of both models are almost the…
malioboro
  • 2,729
  • 3
  • 20
  • 46
3
votes
1 answer

Has anyone tried to train a GPT model predicting the next N tokens instead of the next one token?

I have been thinking about how learning via text works on humans: we read words, and often we need to read ahead a few words to understand more clearly the ideas that we read before. Most of the time, just reading the next word in a sentence is not…
bruno
  • 33
  • 2
3
votes
0 answers

How to interpret the training loss curves in Soft-Actor-Critic (SAC)?

I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is…
2
votes
2 answers

Does MSE loss function work in NN training for predicting values between 0-1?

In a NN regression problem, considering that MSE is squaring the error and the error is between 0 and 1 would it be pointless to use MSE as our loss function during model training? For example: MSE = (y_pred - y_true) ^ 2 @ Expected model output…
2
votes
0 answers

GAN : Why does a perfect discriminator mean no gradient for the generator?

In the training of a Generative Adversarial Networks (GAN) system, a perfect discriminator (D) is one which outputs 1 ("true image") for all images of the training dataset and 0 ("false image") for all images created by the generator (G). I've read…
2
votes
2 answers

Val loss doesn’t decrease after a certain number of epochs

I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. This is my network’s configuration Model( (fc): Sequential( (0):…
2
votes
1 answer

Why do we subtract logsumexp from the outputs of this neural network?

I'm trying to understand this tutorial for Jax. Here's an excerpt. It's for a neural net that is designed to classify MNIST images: from jax.scipy.special import logsumexp def relu(x): return jnp.maximum(0, x) def predict(params, image): #…
Foobar
  • 151
  • 5
2
votes
1 answer

What is being optimized with WGAN loss? Is the generator maximizing or minimizing the critic value?

I am kind of new to the field of GANs and decided to develop a WGAN. All of the information online seems to be kind of contradicting itself. The more I read, the more I become confused, so I'm hoping y'all can clarify my misunderstanding with WGAN…
2
votes
2 answers

Why does triplet loss allow to learn a ranking whereas contrastive loss only allows to learn similarity?

I am looking at this lecture, which states (link to exact time): What the triplet loss allows us in contrast to the contrastive loss is that we can learn a ranking. So it's not only about similarity, being closer together or being further apart,…
2
votes
1 answer

How to handle invalid actions for next state in Q-learning loss

I am implementing an RL application in an environment with illegal moves. For handling the illegal moves, I am currently just picking an action as the maximum Q-value from the set of legal Q-values. So, it is clear that when deciding on actions we…
2
votes
3 answers

Where does the so-called 'loss' / 'loss function' fit into the idea of a perceptron / artificial neuron (as presented in the figure)?

I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.3 Choice of Activation and Loss Functions presents the following figure: $\overline{X}$ is the features, $\overline{W}$ is the weights, and…
2
votes
1 answer

How to perform back-propagation in Decoupled Neural Interfaces?

I am attempting to create a fully decoupled feed-forward neural network by using decoupled neural interfaces (DNIs) as explained in the paper Decoupled Neural Interfaces using Synthetic Gradients (2017) by Max Jaderberg et al. As in the paper, the…
2
votes
1 answer

What is the "contradictory loss" in the "Old Photo Restoration via Deep Latent Space Translation" paper?

In page 4 of the paper Old Photo Restoration via Deep Latent Space Translation, it says the encoder $E_{R,X}$ of $VAE_1$ tries to fool the discriminator with a contradictory loss to ensure that $R$ and $X$ are mapped to the same space. What do they…
2
votes
1 answer

Why L2 loss is more commonly used in Neural Networks than other loss functions?

Why L2 loss is more commonly used in Neural Networks than other loss functions? What is the reason to L2 being a default choice in Neural Networks?
2
votes
1 answer

Do smaller loss values during DQN training produce better policies?

During the training of DQN, I noticed that the model with prioritized experience replay (PER) had a smaller loss in general compared to a DQN without PER. The mean squared loss was an order of magnitude $10^{-5}$ for the DQN with PER, whereas the…
calveeen
  • 1,251
  • 7
  • 17
1
2 3 4 5 6