4

My question is: how to add certain negative samples to the training dataset to suppress those samples that are recognized as the object.

For example, if I want to train a car detector. All my training images are outdoor images with at least one car. However, when I use the trained detector on indoor images, sometimes I got the wrong object detected (false positive). How can I add more indoor images (negative samples) to the training dataset to improve the accuracy? Can I just add them without any labeling?

nbro
  • 39,006
  • 12
  • 98
  • 176
fnhdx
  • 143
  • 1
  • 4
  • What do you mean by "Can I just add them without any labeling?"? How are you training your neural network? Aren't you training your model in a supervised way? – nbro Jun 23 '21 at 10:02
  • 1
    I am training a SSD model for object detection. Each of my training images contains at least one object and the objects are labeled (bounding box) manually. When I test my model on certain scenario (not in the training set) I got many false positive results. I am just wondering how should I improve model? Can I add those failure images to the training set without labeling (because no object on those images)? Hopes it is clear – fnhdx Jun 23 '21 at 13:49

1 Answers1

4

The quick answer: yes you can, just add images without labels, just make sure that in the negative samples there are no cars or you will make the AI crazy (i.e. convergence & instability issues).

However that might not be the better approach to go. Why? Because your dataset already have enough negative examples. This was pointed out by the famous paper Focal Loss for Dense Object Detection. The paper basically proposes to think that each pixel of a dataset image is a training signal. Then for each image there are lots of pixels with negative signal (nothing in it: sky, ground, trees...) and only a few with positive signal (the actual car).

So if each image of the dataset have more negative signals (pixels) than positive, then, the problem might not be in the negative examples. So that leaves you with 2 ways to go:

  • Use a loss function that focus more on positive signal (car pixels) than in the negative examples (not car pixels) such focal loss or derivatives
  • Add more positive examples in the dataset

I can confirm what this paper stated in my every day experiments. We have a battery of experiments right now that performs best with focal loss & no negative examples VS other experiments without focal loss & negative examples.

Just for reference this is what happens when there are lots of negative examples:

enter image description here

The AI took a while to figure out that negative samples are not useful (1M steps) in this experiment. From them on it just focused on the positive samples and this training started to converge (and inference started to show something meaningful)

JVGD
  • 1,088
  • 1
  • 6
  • 14