Questions tagged [softmax]

For questions related to the softmax function, which a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. The softmax is often used as the activation function of the output layer of a neural network.

36 questions
17
votes
2 answers

Are softmax outputs of classifiers true probabilities?

BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…
6
votes
2 answers

Why does TensorFlow docs discourage using softmax as activation for the last layer?

The beginner colab example for tensorflow states: Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is…
galah92
  • 163
  • 5
5
votes
2 answers

What is the advantage of using cross entropy loss & softmax?

I am trying to do the standard MNIST dataset image recognition test with a standard feed forward NN, but my network failed pretty badly. Now I have debugged it quite a lot and found & fixed some errors, but I had a few more ideas. For one, I am…
5
votes
1 answer

Which paper introduced the term "softmax"?

Nowadays, the softmax function is widely used in deep learning and, specifically, classification with neural networks. However, the origins of this term and function are almost never mentioned anywhere. So, which paper introduced this term?
nbro
  • 39,006
  • 12
  • 98
  • 176
4
votes
1 answer

Why are policy gradient methods more effective in high-dimensional action spaces?

David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…
2
votes
1 answer

Why do we use the softmax instead of no activation function?

Why do we use the softmax activation function on the last layer? Suppose $i$ is the index that has the highest value (in the case when we don't use softmax at all). If we use softmax and take $i$th value, it would be the highest value because $e$ is…
2
votes
2 answers

What do the authors of this paper mean by the bias term in this picture of a neural network implementation?

I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14). The first three…
1
vote
1 answer

Dealing with noise in models with softmax output

I have a device with an accelerometer and gyroscope (6-axis). The device sends live raw telemetry data to the model 40 samples for each input, 6 values per sample (accelerometer xyz, gyroscope xyz). The model predicts between 12 different labels of…
1
vote
1 answer

Number of units in Final softmax layer in VGGNet16

I am trying to implement and train neural network model VGGNet from scratch, on my own data. I am reproducing all the layers of the model. I am having a confusion about the last, fully connected softmax layer. In the research paper by Simonyan and…
1
vote
2 answers

Backpropagation with CrossEntropy and Softmax, HOW?

Let Zs be the input of the output layer (for example, Z1 is the input of the first neuron in the output layer), Os be the output of the output layer (which are actually the results of applying the softmax activation function to Zs, for example, O1 =…
1
vote
1 answer

Why are SVMs / Softmax classifiers considered linear while neural networks are non-linear?

My understanding is that neural networks are definitely not linear classifiers, as the point of functions like ReLU is to introduce non-linearity. However, here's where my understanding starts to break down. A classifier, like Softmax or SVM is…
1
vote
1 answer

Trouble writing the backpropagation algorithm in python through crossentropy and softmax

so I am writing my own neural network library for a class project and I got everything working for a simple 2-class test using the distance (L2) cost function. I wanted to get a similar result using softmax and crossentropy instead. I did the…
1
vote
0 answers

Use soft-max post-training for a ReLU trained network?

For a project, I've trained multiple networks for multiclass classification all ending with a ReLU activation at the output. Now the output logits are not probabilities. Is it valid to get the probability of each class by applying a softmax function…
1
vote
1 answer

Is it normal that the values of the LogSoftmax function are very large negative numbers?

I have trained a classification network with PyTorch lightning where my training step looks like below: def training_step(self, batch, batch_idx): x, y = batch y_hat = self(x) loss = F.cross_entropy(y_hat, y) self.log("train_loss",…
pd109
  • 125
  • 4
1
vote
1 answer

Is it appropriate to use a softmax activation with a categorical crossentropy loss?

I have a binary classification problem where I have 2 classes. A sample is either class 1 or class 2 - For simplicity, lets say they are exclusive from one another so it is definitely one or the other. For this reason, in my neural network, I have…
1
2 3