Why neural networks tend to be trained to recognize multiple things instead of just one?

Question

I was watching this series: https://www.youtube.com/watch?v=aircAruvnKk

The series demonstrates neural networks by building a simple number recognizing network.

It got me thinking: Why neural networks try to recognize multiple labels instead of just one? In the above example, the network tries to recognize numbers from 0 to 9. What is the benefit of trying to recognize so many things simultaneously? Wouldn't it make it easier to reason about if there would be 10 different neural networks which would specialize to recognize only one number at a time?

You could train 10 different models in this case. Model one would then be trained to learn is this a zero or not a zero, and so forth. This would increase the number of negative training samples for each model. But what if then there comes a case where you want to use this in practice? A: you would have to perform 10 forward passes instead of just one. B: the case were the input is a sloppy written 4 that looks similar to a 9, if you have one model then it could learn those small slight variations, if you have 10 different, the difference between a 4 and a 9 would not be learned. — Isbister, Sep 30 '20 at 15:17
Thank you for the answer! But wouldn't single purpose networks make each individual network simpler? So you could have less neurons and so on. Which (I believe) would make them more performant. Is it 10x more performant than a bigger network (so that it would even out running the picture trough 10 trained networks)? dunno. — Ville, Oct 01 '20 at 12:51
The point about difference between 4 and 9 kind of cases is good, but on the other hand, couldn't both neurons still result similar probability for 9s and 4s even if they would live in the same network? And when you would train e.g number 4, you would still feed 9's to it for negative results, so the learning would still happen I believe. Would it be as well learnt? Not sure and would be nice to understand which way it is. I'm trying to understand the practical differences and pros and cons of these approaches. — Ville, Oct 01 '20 at 12:51
Multiple specialized networks are called Ensemble Networks. It may indeed be worth training it that way if you are teaching a CNN various representations of the same number. It'd just be computationally more expensive to train and run. If you want to compare, then create programs as a single and ensemble network and try. Learn by doing. — Nav, May 31 '23 at 14:56

score 1 · Answer 1 · answered Oct 01 '20 at 14:05

In practice you never want to classify just a single digit rather than series. In such case you have to pass a patch of image to multiple network, which would make it inconvenient. If you built different accurate models, training parameters will not significantly reduced. For example sloppy written 6, in a single model the probability of being 6 and 0 would be close, not same if you consider likelihood you may get closest answer. While with different models the probability may vary in a greater scale and you may not have good generalization as you have in single model. At the end everything boils down to generalization and in my experience neural networks trained with multiple things have good generalization property that a single.

score 1 · Answer 2 · answered Jan 02 '21 at 09:23

Your question seems to be talking about two slightly different topics:

Pros and cons of 'one vs rest' approach in multi-class classification
Use of Neural Networks in single-output vs multi-class classification problems

One vs Rest in Multi-Class Classification

Recognising digits is an example of multi-class classification. The approach you outline is the kind of approach summarised in the "One vs Rest" section of the Wikipedia page on multi-class classification. The page notes the following issues with this approach:

Firstly, the scale of the confidence values may differ between the binary classifiers. Second, even if the class distribution is balanced in the training set, the binary classification learners see unbalanced distributions because typically the set of negatives they see is much larger than the set of positives.

You might also like to look into another approach called One vs One ('One vs Rest' vs 'One vs One') which sets up the classification problem as a set of binary alternatives. In the digit recognition case you'd end up with a classifier for "1 or 2?", "1 or 3?", "1 or 4?" etc. This might help with the "4 vs 9" problem but it does mean an enormous amount of classifiers, that might be better represented in some kind of network. Perhaps even a network inspired by brain neurons.

Use of Neural Networks in single output vs multi-class classification

There is nothing magical about a neural network that means it has to be used for multi-class classification. Nor is there anything magical about it that makes it the only option for multi-class classification.

For example:

Using a Neural Network for sentiment analysis outputs one single answer about how positive/negative a piece of text is.
Digit recognition using SVM uses something that isn't a neural network for multi-class classification

Conclusions

A 10-class neural network is used to identify digits because this has turned out to be an efficient way of doing so when compared with one vs rest and one vs all approaches.

A bit off-topic, perhaps, but if you think about this in the context of T5, there does seem to be a trend of moving towards larger more multi-purpose models rather than lots of small specialised models.

score 0 · Answer 3 · answered Dec 03 '20 at 01:52

0

Imagine a small kid who has no idea about the world around it. You teach the kid how to write the number "6" and that is the only thing that it knows.

Now, No matter what other number you show the kid , it's gonna always respond with "6" because that is the only thing it knows or it has learned.

You teach the kid how to write the number "9", so now it knows how to differentiate a "6" from a "9" and no matter what other number you show the kid, there is a 50 % chance of it responding with a "6" or a "9" because it knows only that much.

The purpose of a neural network is to understand underlying distribution in data that can help it in classifying different numbers. It's important to have a classifier that understands general characteristics of numbers and help us with our task. If you have 10 neural networks trained on 10 different digits, and you show each of these networks the number "10", each network will output the number on which it was trained because that is all it knows(similar to the naive kid above).

I hope this answers your question!

answered Dec 03 '20 at 01:52

Srivatsan Ramesh

69
1
4

1

I guess the kid would know "6 or not 6" in this case? So in case of 10 different neural networks, only one of them would answer "yes 10" on well trained networks. – Ville Dec 03 '20 at 02:07
Your question was on why does a neural network need to be trained on multiple labels instead of one right? In "6" vs "not 6", the model is already trained on 2 labels – Srivatsan Ramesh Dec 03 '20 at 02:11
Nope, there would be a confidence between 0 and 1 wether it is the number 6 or not. Only 1 label needed. – Ville Dec 03 '20 at 02:27
I think the question should have been rephrased to recognising multiple digits over labels because labels are local to a model, for a single model to identify whether it is 10 or not, it still needs 2 labels(10 or not 10) whereas it is recognising one digit. Anyways to answer your question on having 10 models recognise 10 digits vs 1 model recognising 10 digits, it's just convenience! Would you have 10 people recognise 10 digits for you or 1 person with the ability to do the work of 10 people? There is nothing wrong with either! It's just convenience I guess! – Srivatsan Ramesh Dec 03 '20 at 04:08

Why neural networks tend to be trained to recognize multiple things instead of just one?

3 Answers3