3

Suppose one trains a CNN to determine if something was either a cat/dog or neither (2 classes), would it be a good idea to assign all cats and dogs to one class and everything else to another? Or would it be better to have a class for cats, a class for dogs, and a class for everything else (3 classes)? My colleague argues for 3 classes because dogs and cats have different features, but I wonder if he's right.

nbro
  • 39,006
  • 12
  • 98
  • 176
John M.
  • 227
  • 1
  • 6

3 Answers3

2

If you want to determine if something is either a

cat/dog or neither

you need 2 classes:

  1. one for dog or cat, and
  2. one for anything else.

However, if you assign all cats and dogs to the same class $A$, if an input is classified as $A$, then you won't be able to know whether it is a dog or a cat, you will just know that it is either a dog or a cat.

In case you wanted to distinguish between cats and dogs too (apart from neither of them), then you'll need $3$ classes.

Finally, if you specify only 2 classes:

  1. dog, and
  2. cat,

then your CNN will try to classify any new input as either a dog or a cat, even though it is neither a dog nor a cat (e.g. maybe it is a horse).

nbro
  • 39,006
  • 12
  • 98
  • 176
nsaura
  • 258
  • 2
  • 7
  • To put it more in more general terms, I basically try to determine if something was "A or B" or "neither A or B" (2 classes). There's no need to determine whether an input was A or B. I just thought if assigning both A-objects to the same class as B-objects (the "A or B" class), the CNN may "diffuse" the features of A and B. – John M. Mar 26 '18 at 18:05
1

The best approach may be to have a cat, dog, and neither class (3 classes total) and go with a regression approach — specifically, outputting the probabilities of each class for any given input. From there, you can always take the probabilities of each output and derive the probability of a cat and dog class or neither class. Also, make sure you use the right activation on the output layer and cost function so that you can interpret the outputs as probabilities (e.g. softmax activation and cross-entropy loss).

Greenstick
  • 416
  • 3
  • 10
  • This answer does not explain why your suggestion is the best approach. You should [edit your post](https://ai.stackexchange.com/posts/5813/edit) to explain why that's the case. – nbro Sep 25 '20 at 19:46
  • @nbro This was over 2 years ago and the answer has been accepted. It’s certainly not perfect, but unless a specific clarification is asked I don’t intend to revisit it. – Greenstick Sep 26 '20 at 06:50
1

As far as generalization error is concerned, you are better off by learning the data distribution of (A and B) classes using unsupervised criterion.

If you capture the underlying factors that explain most of the variations belong to A and B classes, after that, fine-tune it using a supervised criterion. in this way if you used two classes one for (A or B) and the other for neither (A or B), you will not force the model to learn features don't belong to (A or B), because the model just checks if a new data point is probably likely drawn from the data distribution that resembles (A or B).

Side note: you will never have the data necessary to explore the internal structure of the otherwise class (neither A nor B).

Fadi Bakoura
  • 364
  • 2
  • 6