Highest Voted 'loss' Questions - Artificial Intelligence Stack Exchange

10

votes

3 answers

Should I choose a model with the smallest loss or highest accuracy?

I have two Machine Learning models (I use LSTM) that have a different result on the validation set (~100 samples data): Model A: Accuracy: ~91%, Loss: ~0.01 Model B: Accuracy: ~83%, Loss: ~0.003 The size and the speed of both models are almost the…

asked Feb 07 '19 at 06:32

malioboro

2,729
3
20
46

3

votes

1 answer

Has anyone tried to train a GPT model predicting the next N tokens instead of the next one token?

I have been thinking about how learning via text works on humans: we read words, and often we need to read ahead a few words to understand more clearly the ideas that we read before. Most of the time, just reading the next word in a sentence is not…

ai-design transformer loss gpt

asked Apr 16 '23 at 19:25

bruno

33
2

3

votes

0 answers

How to interpret the training loss curves in Soft-Actor-Critic (SAC)?

I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is…

deep-rl actor-critic-methods loss soft-actor-critic learning-curve

asked Jul 01 '21 at 07:44

Manuel

45
5

2

votes

2 answers

Does MSE loss function work in NN training for predicting values between 0-1?

In a NN regression problem, considering that MSE is squaring the error and the error is between 0 and 1 would it be pointless to use MSE as our loss function during model training? For example: MSE = (y_pred - y_true) ^ 2 @ Expected model output…

deep-learning loss mean-squared-error

asked Mar 27 '23 at 00:04

Darren Rahnemoon

27
5

2

votes

0 answers

GAN : Why does a perfect discriminator mean no gradient for the generator?

In the training of a Generative Adversarial Networks (GAN) system, a perfect discriminator (D) is one which outputs 1 ("true image") for all images of the training dataset and 0 ("false image") for all images created by the generator (G). I've read…

training generative-adversarial-networks loss gradient wasserstein-gan

asked Feb 02 '23 at 08:45

Soltius

221
1
8

2

votes

2 answers

Val loss doesn’t decrease after a certain number of epochs

I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss function. This is my network’s configuration Model( (fc): Sequential( (0):…

deep-learning classification long-short-term-memory pytorch loss

asked Oct 25 '22 at 04:18

helloworld

65
6

2

votes

1 answer

Why do we subtract logsumexp from the outputs of this neural network?

I'm trying to understand this tutorial for Jax. Here's an excerpt. It's for a neural net that is designed to classify MNIST images: from jax.scipy.special import logsumexp def relu(x): return jnp.maximum(0, x) def predict(params, image): #…

neural-networks machine-learning loss mnist

asked Jun 25 '22 at 02:06

Foobar

151
5

2

votes

1 answer

What is being optimized with WGAN loss? Is the generator maximizing or minimizing the critic value?

I am kind of new to the field of GANs and decided to develop a WGAN. All of the information online seems to be kind of contradicting itself. The more I read, the more I become confused, so I'm hoping y'all can clarify my misunderstanding with WGAN…

generative-adversarial-networks loss wasserstein-gan

asked Jun 24 '22 at 02:59

Gabriel Mongaras

31
3

2

votes

2 answers

Why does triplet loss allow to learn a ranking whereas contrastive loss only allows to learn similarity?

I am looking at this lecture, which states (link to exact time): What the triplet loss allows us in contrast to the contrastive loss is that we can learn a ranking. So it's not only about similarity, being closer together or being further apart,…

deep-learning reference-request loss triplet-loss-function contrastive-learning

asked May 03 '22 at 12:25

Gulzar

729
1
8
23

2

votes

1 answer

How to handle invalid actions for next state in Q-learning loss

I am implementing an RL application in an environment with illegal moves. For handling the illegal moves, I am currently just picking an action as the maximum Q-value from the set of legal Q-values. So, it is clear that when deciding on actions we…

machine-learning q-learning temporal-difference-methods loss action-spaces

asked Sep 23 '21 at 18:17

John Rothman

118
5

2

votes

3 answers

Where does the so-called 'loss' / 'loss function' fit into the idea of a perceptron / artificial neuron (as presented in the figure)?

I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.3 Choice of Activation and Loss Functions presents the following figure: $\overline{X}$ is the features, $\overline{W}$ is the weights, and…

objective-functions activation-functions artificial-neuron perceptron loss

asked Jun 13 '21 at 23:36

The Pointer

527
3
17

2

votes

1 answer

How to perform back-propagation in Decoupled Neural Interfaces?

I am attempting to create a fully decoupled feed-forward neural network by using decoupled neural interfaces (DNIs) as explained in the paper Decoupled Neural Interfaces using Synthetic Gradients (2017) by Max Jaderberg et al. As in the paper, the…

neural-networks backpropagation papers loss

asked Dec 30 '16 at 01:42

Brian Sharp

31
1

2

votes

1 answer

What is the "contradictory loss" in the "Old Photo Restoration via Deep Latent Space Translation" paper?

In page 4 of the paper Old Photo Restoration via Deep Latent Space Translation, it says the encoder $E_{R,X}$ of $VAE_1$ tries to fool the discriminator with a contradictory loss to ensure that $R$ and $X$ are mapped to the same space. What do they…

terminology papers generative-adversarial-networks variational-autoencoder loss

asked Dec 07 '20 at 16:33

Robert

21
1

2

votes

1 answer

Why L2 loss is more commonly used in Neural Networks than other loss functions?

Why L2 loss is more commonly used in Neural Networks than other loss functions? What is the reason to L2 being a default choice in Neural Networks?

neural-networks deep-learning objective-functions regularization loss

asked Jul 27 '20 at 17:57

Ali Khalili

69
7

2

votes

1 answer

Do smaller loss values during DQN training produce better policies?

During the training of DQN, I noticed that the model with prioritized experience replay (PER) had a smaller loss in general compared to a DQN without PER. The mean squared loss was an order of magnitude $10^{-5}$ for the DQN with PER, whereas the…

reinforcement-learning dqn loss

asked May 11 '20 at 16:30

calveeen

1,251
7
17

Questions tagged [loss]