For binary classification learning problems, how should I label instances where I'm only 60% sure?

Question

I've come across a few binary classification problems lately where the labelling was challenging even for an expert. I'm wondering what I should do with this. Here are some of my suggestions to get the ball rolling:

Make a third category called "unsure" then make it a three-class classification problem instead.
Make a third category called "unsure" and just remove these from your training set.
Make a third category called "unsure" and during training model this as a 0.5 such that the binary cross entropy loss looks like $-0.5\log(\hat{y})-0.5\log(1-\hat{y})$
Allow the labeller to pick a percentage on a sliding scale (or maybe multiple choice: (0%, 25%, 50%, 75%, 100%), and take that into account when calculating cross entropy (as in my point above).

I recently saw a paper which goes for option 2, although that's not enough to convince me. Here's the relevant quote:

In case of a high-level risk, collision is imminent and the driver must react in less than 0.5 s (TTC < 0.5s). For low-level risk, the TTC is more than 2.0 s (TTC > 2.0s). Videos that show intermediate-level risk (0.5 s ≤ TTC ≤ 2.0 s), which is a mixture of high- and low-level risks, were not included in the NIDB because when training a convnet, it must be possible to make a clear visual distinction of risk.

Your problem seems to be related to [weakly supervised learning](https://academic.oup.com/nsr/article/5/1/44/4093912) (WSL). Maybe you get an answer to your question by reading about WSL. — nbro, Jan 23 '21 at 23:34
Duplicates: https://stats.stackexchange.com/questions/74042/classifier-for-uncertain-class-labels and https://stats.stackexchange.com/questions/218656/classification-with-noisy-labels — kjetil b halvorsen, Feb 02 '21 at 18:31

For binary classification learning problems, how should I label instances where I'm only 60% sure?

0 Answers0