5

I'm working on a project where there is a limited dataset of videos (about 200). We want to train a model that can detect a single class in the videos. That class can be of multiple different types of shapes (thin wire, a huge area of the screen, etc).

There are three options on how we can label this data:

  1. Image classification (somewhere in the image is this class)
  2. Bounding box (in this area, there is the class)
  3. Semantic segmentation (these pixels are the class)

My assumption is that if the model was trained on semantic segmentation data it would perform slightly better than bounding box data. I'm also assuming it would perform way better than if the model only learned on image classification data. Is that correct?

nbro
  • 39,006
  • 12
  • 98
  • 176
NateW
  • 153
  • 6

1 Answers1

3

It depends on what is your ultimate goal. If your goal is to simply classify the object in the image, having more complex output won't help. Simpler output representation yields better result. If your goal is to detect the bounding box, output the bounding box. There is no need for a more complex output feature. If you use a segmentation method for bounding box detection, it is more prone to error because of it's excess output features.

Assume you are given a gradr 6 math test. If you do the questions using grade 12 maths knowledge and do it with calculus and stuff to make your calculations seems more complex, will you have a higher mark than doing it the normal way? No! The marks is the same or even less due to higher chance of error in doing complex calculations.

In short, higher complexity on your labels won't help your task if it is a simple task. Hope this would help you and have a nice day!

Clement
  • 1,725
  • 7
  • 24
  • Thanks for the response, Clement! Yes, I just want to classify if the class is in the image. I'm surprised by your answer. Do you have any other resources to back up the claim? When I googled the question I only found one paper: https://vision.cornell.edu/se3/wp-content/uploads/2014/09/cs2007-0908.pdf. It claims that "Segmenting an image does improve object categorization accuracy" – NateW Nov 05 '19 at 00:54
  • 2
    If you see this : http://cocodataset.org/#detection-leaderboard (COCO detection leaderboards), You can see that segmentation accuracy is lower then the bbox accuracy. Hope it can help you. – Clement Nov 05 '19 at 05:05