4

Suppose I have a standard image classification problem (i.e. CNN is shown a single image and predicts a single classification for it). If I were to use bounding boxes to surround the target image (i.e. convert this into an object detection problem), would this increase classification accuracy purely through the use of the bounding box?

I'm curious if the neural network can be "assisted" by us when we show it bounding boxes as opposed to just showing it the entire image and letting it figure it all out by itself.

nbro
  • 39,006
  • 12
  • 98
  • 176
user4779
  • 193
  • 5

4 Answers4

0

Another way to ask the question is: Does sound get clearer when you remove the background noise?

The obvious answer is yes and in the case of image classification, the answer is also generally yes.

In most cases reducing the noise (irrelevant pixels) will strengthen the signal (activations) the neural network is trying to find.

IsakBosman
  • 171
  • 1
  • 4
  • Thanks for the answer, but I'm still a little confused. If the network is being forced to recognize only the region in my bounding box, thus reducing the noise, how can it then generalize to the noise outside the bounding box in a live/test sample if it's never been trained with that noise? I also read the answer here: https://ai.stackexchange.com/questions/10177/whats-the-role-of-bounding-boxes-in-object-detection which has left me further confused. Does the NN actively learn to ignore the noise outside its bounding box, or is the only thing it's exposed to within the bounding box? – user4779 Apr 23 '19 at 14:23
0

For sure, this may be helpful. You remove excessive information from image and make the classification task a bit more simple. But you need to remember that bounding boxes may work not perfectly and accuracy of the classification algorithm may suffer from corrupted inputs (when bounding boxes are corrupted).

antoleb
  • 101
  • 1
0

Information outside of the bounding box could still be useful as context. So not to lose it you can but bounding box as pixel mask into additional "pseudo-color" layer. That way you can also have many bounding boxes without changing input architecture. You will give network additional info without losing anything, so result shouldn't be worse at least.

mirror2image
  • 695
  • 5
  • 14
0

One could imagine using a segmentation network as a first step of processing. Then feeding an area corresponding to a bounding box of each segmented object to the classifier.

Potentially that could yield an increase in performance in classifying objects in an image, but not without a cost of training time, sine suddenly there are two networks to train instead of just one.