What is the general procedure to use and train neural networks for multi-class classification?

Question

I am very new to machine learning. I am following the course offered by Andrew Ng. I am very confused about how we train our neural network for multi-class classification.

Let's say we have $K$ classes. For $K$ classes, we will be training $K$ different neural networks.

But do we train one neural network at a time for all features, or do we train all $K$ neural networks at a time for one feature?

Please, explain the complete procedures.

score 2 · Answer 1 · answered Jul 07 '18 at 06:30

Let us suppose that you are training a neural network for classfing images of vehicles , then the input vector , image of the "vehicle" will be a 2D array of pixels. This undergoes several transformations at each layer of the neural network , the last layer of the neural network produces another vector whose dimensions are lesser than the original image vector.

So the network is mapping images to some vectors in a high dimensional space. In order to classify the images it is now sufficient to classify the vectors obtained from the network of their corresponding images, you can do this with a simple "linear" classifier using a softmax layer.

So all the layers of the network except the last layer are transforming the image representation to a "vector". This vector is classified by a linear softmax classifier by the last layer of neural network.

score 2 · Accepted Answer · edited Jul 15 '21 at 13:27

2

Let's say we have $K$ classes. For $K$ classes, we will be training $K$ different neural networks.

No, you still train one network.

With binary classification tasks, where you have only two mutually exclusive categories, like "yes/no" or "true/false", you can get away with a single output node with a sigmoid activation. The output of the sigmoid is interpreted as indicating one category for values $> 0.5$ and the other for values $\leq 0.5$.

With multi-class classification, you have $K$ outputs (one for each category). The problem, in this case, is that if the network gets the class wrong, in general, you cannot decide in one step which one of the other $K - 1$ categories is the correct one. So, the output is actually passed through an extra softmax layer, which outputs probabilities for each class.

But do we train one neural network at a time for all features, or do we train all $K$ neural networks at a time for one feature?

You present all features for each training example to the network at the same time. So, for $N$ features you have $N$ input nodes, and you feed all of them into the neural network.

edited Jul 15 '21 at 13:27

nbro

39,006
12
98
176

answered Jul 07 '18 at 23:36

cantordust

943
6
10

okay it means for K classes and M training examples we will be training our neural network K*M times. – Reena Kandari Jul 08 '18 at 12:01
One more doubt sir.Do we apply gradient descent(any varient) when using neural network? in [this](https://github.com/m-a-y-a-n-k/Weather-Prediction-Using-Neural-Networks/blob/master/Predictor.m) code they are applying advance optimization before Back Propagation. According to me in BP as we are already finding the best set of weights then applying gradient descent is irrelevant in this case.Am I correct or missing something ? – Reena Kandari Jul 08 '18 at 12:08
@ReenaKandari If you have M data points, you would be using a subset (generally, between `60%` and `80%` of `M`) for training, and you do that multiple times (= epochs). However, you don't have to train the network `K` times for each training example - the whole point of softmax is that the correct class is "promoted" while *all* others are "suppressed" *simultaneously*, if that makes sense. – cantordust Jul 09 '18 at 22:32
@ReenaKandari Re your other question: yes, in the overwhelming majority of cases neural networks are optimised using gradient descent. I am not proficient in Matlab, but it seems to me that in the example you provided they are minimising a cost function using gradient descent. – cantordust Jul 09 '18 at 22:33
sir @cantordust, actually I am not aware of how Softmax works and how classes are 'promoted' and 'suppressed simultaneously '.Can you please explain it a little bit more that how these things works in terms of neural network? – Reena Kandari Jul 14 '18 at 02:32

What is the general procedure to use and train neural networks for multi-class classification?

2 Answers2