Your question seems to be talking about two slightly different topics:
- Pros and cons of 'one vs rest' approach in multi-class classification
- Use of Neural Networks in single-output vs multi-class classification problems
One vs Rest in Multi-Class Classification
Recognising digits is an example of multi-class classification. The approach you outline is the kind of approach summarised in the "One vs Rest" section of the Wikipedia page on multi-class classification. The page notes the following issues with this approach:
Firstly, the scale of the confidence values may differ between the binary classifiers. Second, even if the class distribution is balanced in the training set, the binary classification learners see unbalanced distributions because typically the set of negatives they see is much larger than the set of positives.
You might also like to look into another approach called One vs One ('One vs Rest' vs 'One vs One') which sets up the classification problem as a set of binary alternatives. In the digit recognition case you'd end up with a classifier for "1 or 2?", "1 or 3?", "1 or 4?" etc. This might help with the "4 vs 9" problem but it does mean an enormous amount of classifiers, that might be better represented in some kind of network. Perhaps even a network inspired by brain neurons.
Use of Neural Networks in single output vs multi-class classification
There is nothing magical about a neural network that means it has to be used for multi-class classification. Nor is there anything magical about it that makes it the only option for multi-class classification.
For example:
Conclusions
A 10-class neural network is used to identify digits because this has turned out to be an efficient way of doing so when compared with one vs rest and one vs all approaches.
A bit off-topic, perhaps, but if you think about this in the context of T5, there does seem to be a trend of moving towards larger more multi-purpose models rather than lots of small specialised models.