3

In every computer vision project, I struggle with labeling guidelines for border cases. Benchmark datasets don't have this problem, because they are 'cleaned', but in real life unsure cases often constitute the majority of data.

Is 15% of a cat's tail a cat? Is a very blurred image of a cat still a cat? Are 4 legs of a horse, but the rest of its body of the frame still a horse?

Would it be easier or harder to learn a regression problem instead of classification? I.e by taking 5 subclasses of class confidence (0.2,0.4,0.6,0.8,1.) and using them as soft targets?

Or is it better to just drop every unsure case from training or/and testing set?

I experimented a lot with different options, but weren't able to get any definitive conclusion. This problem is so common that I wonder if it has already been solved for good by someone?

nbro
  • 39,006
  • 12
  • 98
  • 176
Huxwell
  • 101
  • 4

1 Answers1

2

Unfortunately, the answer here is that "it depends". People have taken different approaches to this problem and I'll describe a few here. None of which however is the "right" answer.

Labeling

When generating benchmark datasets, we actually do have this problem. To be honest, most of the time the labeling is done to the best ability of the human. Sometimes ambiguous or difficult cases are separated out and cleaned, but, usually, labelers are given a set of concrete guidelines to say whether or not something is a cat. In the case where a human is unsure, usually, that data is thrown out or moved over to the "difficult" pile. Unfortunately, this difficult pile isn't publicly available for the most part in many public datasets. However, if you look at most public datasets, even with significant cleaning these cases exist.

Bayesian Deep Learning

One common theme that I've seen is that people add proper probabilistic uncertainties into their models. This is different than the output of a softmax at the end of an object detection network. The output of a softmax is just some number $\in [0, 1]$ that represents the regressed output of a classification within the model. For example, in SSD the softmax is just the classification of a specific anchor box. There is no real "certainty" information associated with it. In most standalone models (and without some pretty hard assumptions) it doesn't have any rigorous probabilistic meaning.

So how would we add a probability? How can we say that "There is a 80% chance that this image is a cat?. The paper, "What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? does a pretty good job at looking at this problem.

Basically what they do is build Bayesian models to break have their models explicitly output two types of uncertainties.

Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model – uncertainty which can be explained away given enough data.

You can go through the paper to get a better understanding of what's going on, but, basically, they fit models to regress both the uncertainty associated with the model, and the uncertainty associated with the data.

Zero-Shot Learning

What about the case where you have a very well defined object, but you've never seen it before? Let's say you've seen a bunch of horses but you've never seen a zebra. As a human, you would look at it for a while and basically just assume that it's a horse with black and white stripes. There is an entire field of machine learning dedicated to this topic. I'm not an expert in it personally, but there are plenty of resources online if you're interested.

A Practical Note

In industry, we usually try to scope the problem as best as we can such that we don't have to deal with this as much. Now, there are times when this is clearly inevitable. What I've seen if an object isn't clearly detectable, then the algorithm might fallback to just saying that there's "something" there. Consider the case for self-driving cars. It's good to detect if there are pedestrians in the road, but if you don't know if something is a pedestrian it's still useful to know that there's something in the road. For this, you can fallback to unsupervised methods to help distinguish objects. From a labeling perspective, you could imagine an ontology of objects for this purpose. At the root node of this ontology would be just, "something in the road" and branches off to "car", "pedestrian" or, "bike" for example. If a labeler is not sure if something is a pedestrian, but it's def something in the road that shouldn't be hit then it would be labeled as "something in the road". Again though, this is highly dependent upon the application.

juicedatom
  • 527
  • 3
  • 10
  • 2
    Just one correction. You say that the softmax probability represents "specifies how certain the model is that the cat is a cat", but that's not really correct, as you even say that these deterministic models do not really model "uncertainty". See this post https://ai.stackexchange.com/q/24872/2444, for example. Regarding your practical note, this seems to be similar to having another class in your training dataset which is "unclear object". See also this: https://ai.stackexchange.com/q/4889/2444 – nbro Feb 04 '21 at 18:52
  • 1
    For the softmax maybe I wasn't using the correct terminology. The softmax output depends on whatever you are regressing. In the case of SSD for example, the softmax represents the classification regression target on the class of the anchor box. – juicedatom Feb 04 '21 at 18:53
  • I'll update the post to reflect this. Thank you! – juicedatom Feb 04 '21 at 18:55
  • Hm, I'm not sure "that represents the output of a regression target within the model." is correct, but maybe I'm misinterpreting this part. The softmax is used in the context of classification, so why are you saying "regression target"? – nbro Feb 04 '21 at 22:37
  • 1
    Yes, it is classification but you are still regressing some value between zero and one. In industry I've heard the generated labels that are used for SSD and the RCNN families as "regression targets". Which I guess could be confusing. – juicedatom Feb 05 '21 at 23:09
  • 2
    Ideally, softmax is required to represent certainty of the model but, in reality, this is not the case. Since, the model learns to minimize error during training, it tries to assign maximum probability for its predictions. This situation causes very "confidential" model which assigns over 90% most of the time. https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf – verdery Feb 05 '21 at 23:26