Questions tagged [cross-entropy]

For questions related to the concept of cross-entropy in the context of artificial intelligence. For example, when the cross-entropy is used as a loss function to train a neural network.

36 questions
19
votes
1 answer

Why has the cross-entropy become the classification standard loss function and not Kullback-Leibler divergence?

The cross-entropy is identical to the KL divergence plus the entropy of the target distribution. The KL divergence equals zero when the two distributions are the same, which seems more intuitive to me than the entropy of the target distribution,…
10
votes
2 answers

How do I handle negative rewards in policy gradients with the cross-entropy loss function?

I am using policy gradients in my reinforcement learning algorithm, and occasionally my environment provides a severe penalty (i.e. negative reward) when a wrong move is made. I'm using a neural network with stochastic gradient descent to learn the…
7
votes
1 answer

How is division by zero avoided when implementing back-propagation for a neural network with sigmoid at the output neuron?

I am building a neural network for which I am using the sigmoid function as the activation function for the single output neuron at the end. Since the sigmoid function is known to take any number and return a value between 0 and 1, this is causing…
7
votes
1 answer

Which loss function should I use in REINFORCE, and what are the labels?

I understand that this is the update for the parameters of a policy in REINFORCE: $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t}, $$ where $v_t$ is usually the discounted future reward and …
5
votes
2 answers

What is the advantage of using cross entropy loss & softmax?

I am trying to do the standard MNIST dataset image recognition test with a standard feed forward NN, but my network failed pretty badly. Now I have debugged it quite a lot and found & fixed some errors, but I had a few more ideas. For one, I am…
4
votes
3 answers

In logistic regression, why is the binary cross-entropy loss function convex?

I am studying logistic regression for binary classification. The loss function used is cross-entropy. For a given input $x$, if our model outputs $\hat{y}$ instead of $y$, the loss is given by $$\text{L}_{\text{CE}}(y,\hat{y}) = -[y \log \hat{y} +…
4
votes
1 answer

How to formalize learning in terms of information theory?

Consider the following game on a MNIST dataset: There are 60000 images. You can pick any 1000 images and train your Neural Network without access to the rest of images. Your final result is prediction accuracy on all dataset. How to formalize…
4
votes
1 answer

Why does the binary cross-entropy work better than categorical cross-entropy in a multi-class single label problem?

I was just doing a simple NN example with the fashion MNIST dataset, where I was getting 97% accuracy, when I noticed that I was using Binary cross-entropy instead of categorical cross-entropy by accident. When I switched to categorical…
3
votes
0 answers

How do I implement the cross-entropy-method for a RL environment with a continuous action space?

I found many tutorials and posts on how to solve RL environments with discrete action spaces using the cross entropy method (e.g., in this blog post for the OpenAI Gym frozen lake environment). However now I have built my first custom environment,…
3
votes
2 answers

Where is the mistake in my derivation of the GAN loss function?

I was pondering on the loss function of GAN, and the following thing turned out \begin{aligned} L(D, G) & = \mathbb{E}_{x \sim p_{r}(x)} [\log D(x)] + \mathbb{E}_{x \sim p_g(x)} [\log(1 - D(x)] \\ & = \int_x \bigg( p_{r}(x) \log(D(x)) + p_g (x)…
3
votes
0 answers

Is maximum likelihood estimation meaningless for a dataset of only outliers?

From my understanding, maximum likelihood estimation chooses the set of parameters for the estimator that maximizes likelihood with the ground truth distribution. I always interpreted it as the training set having a tendency to have most examples…
2
votes
2 answers

Why do non-linear activation functions that produce values larger than 1 or smaller than 0 work?

Why do non-linear activation functions that produce values larger than 1 or smaller than 0 work? My understanding is that neurons can only produce values between 0 and 1, and that this assumption can be used in things like cross-entropy. Are my…
2
votes
1 answer

How does the implementation of the VAE's objective function equate to ELBO?

For a lot of VAE implementations I've seen in code, it's not really obvious to me how it equates to ELBO. $$L(X)=H(Q)-H(Q:P(X,Z))=\sum_ZQ(Z)logP(Z,X)-\sum_ZQ(Z)log(Q(Z))$$ The above is the definition of ELBO, where $X$ is some input, $Z$ is a latent…
2
votes
1 answer

How do you manage negative rewards in policy gradients?

This old question has no definitive answer yet, that's why I am asking it here again. I also asked this same question here. If I'm doing policy gradient in Keras, using a loss of the form: rewards*cross_entropy(action_pdf,…
2
votes
1 answer

How are weights for weighted x-entropy loss on imbalanced data calculated?

I am trying to build a classifier which should be trained with the cross entropy loss. The training data is highly class-imbalanced. To tackle this, I've gone through the advice of the tensorflow docs and now I am using a weighted cross entropy loss…
1
2 3