2

I've come across a few binary classification problems lately where the labelling was challenging even for an expert. I'm wondering what I should do with this. Here are some of my suggestions to get the ball rolling:

  1. Make a third category called "unsure" then make it a three-class classification problem instead.
  2. Make a third category called "unsure" and just remove these from your training set.
  3. Make a third category called "unsure" and during training model this as a 0.5 such that the binary cross entropy loss looks like $-0.5\log(\hat{y})-0.5\log(1-\hat{y})$
  4. Allow the labeller to pick a percentage on a sliding scale (or maybe multiple choice: (0%, 25%, 50%, 75%, 100%), and take that into account when calculating cross entropy (as in my point above).

I recently saw a paper which goes for option 2, although that's not enough to convince me. Here's the relevant quote:

In case of a high-level risk, collision is imminent and the driver must react in less than 0.5 s (TTC < 0.5s). For low-level risk, the TTC is more than 2.0 s (TTC > 2.0s). Videos that show intermediate-level risk (0.5 s ≤ TTC ≤ 2.0 s), which is a mixture of high- and low-level risks, were not included in the NIDB because when training a convnet, it must be possible to make a clear visual distinction of risk.

Alexander Soare
  • 1,319
  • 2
  • 11
  • 26
  • 1
    Your problem seems to be related to [weakly supervised learning](https://academic.oup.com/nsr/article/5/1/44/4093912) (WSL). Maybe you get an answer to your question by reading about WSL. – nbro Jan 23 '21 at 23:34
  • Duplicates: https://stats.stackexchange.com/questions/74042/classifier-for-uncertain-class-labels and https://stats.stackexchange.com/questions/218656/classification-with-noisy-labels – kjetil b halvorsen Feb 02 '21 at 18:31

0 Answers0