I've come across a few binary classification problems lately where the labelling was challenging even for an expert. I'm wondering what I should do with this. Here are some of my suggestions to get the ball rolling:
- Make a third category called "unsure" then make it a three-class classification problem instead.
- Make a third category called "unsure" and just remove these from your training set.
- Make a third category called "unsure" and during training model this as a 0.5 such that the binary cross entropy loss looks like $-0.5\log(\hat{y})-0.5\log(1-\hat{y})$
- Allow the labeller to pick a percentage on a sliding scale (or maybe multiple choice: (0%, 25%, 50%, 75%, 100%), and take that into account when calculating cross entropy (as in my point above).
I recently saw a paper which goes for option 2, although that's not enough to convince me. Here's the relevant quote:
In case of a high-level risk, collision is imminent and the driver must react in less than 0.5 s (TTC < 0.5s). For low-level risk, the TTC is more than 2.0 s (TTC > 2.0s). Videos that show intermediate-level risk (0.5 s ≤ TTC ≤ 2.0 s), which is a mixture of high- and low-level risks, were not included in the NIDB because when training a convnet, it must be possible to make a clear visual distinction of risk.