3

For supervised learning, humans have to label the images computers use to train in the first place, so the computers will probably get wrong the images that humans get wrong. If so can computers beat humans?

nbro
  • 39,006
  • 12
  • 98
  • 176
dan dan
  • 191
  • 3

1 Answers1

2

When researchers claim "better than human accuracy", they are demonstrating that a computer can beat an individual human on a test. And that is because the ground truth labels are actually higher accuracy than a single human could label the images individually.

There are at least two major ways that ground truth labels can beat an individual human on image tasks.

  1. Additional information is available from the same source as the image. For instance many pictures of pets in the ImageNet database are labeled with a specific breed of animal, due to how they are sourced. Most people who are not experts at pet breeds will score quite badly on a test to identify dog breeds at the fine grained level that ImageNet presents.

  2. Ground truth based on expert opinion can be sourced from multiple experts and their opinions combined. This approach can independently be shown to be more reliable than the opinion of a single person.

So in short yes computers can beat humans when they have had access to better original ground truth, and that is possible, even if that ground truth is generated by humans.

However, in general your concern stands. Ground truth data is a limiting factor. It might be possible in theory for a computer model to have an even better accuracy at the "real" task than the ground truth for a supervised learning task. However, this is next to impossible to prove, and other concerns, such as changes to distribution of real data as opposed to training data, are usually more important at that level of accuracy.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • 1
    I seem to recall that "human crowdsourcing" is/was a solution in captcha identification, in the sense that a single human is not reliable, but a large sample yields correct results. (Makes me wonder if the same technique can be applied to algorithms, where any one algorithm may not be reliable, but an expanding group of unique algorithms might become more reliable.) – DukeZhou Jan 17 '18 at 00:55
  • 1
    @DukeZhou This is used in theoretical computer science. When you can't solve a problem efficiently, often instead you generate a random number and use that number to cheat in some way that allows you to solve the problem efficiently. However, because of the cheat it's not necessarily correct. What you do then is write a formula for the probability of being wrong, run the program a bunch of times, and get a small error term by aggregating all of the results. – Stella Biderman Feb 14 '18 at 07:29
  • 1
    For example, checking if two numbers are equal is hard. It requires looking at every single digit, and some numbers have a huge number of digits. Instead, I can generate a random prime with 1000 digits and check if a = b mod p. Calculating a mod p and b mod p can be done really easily, and then to check the comparison I only have to compare 1000 digit numbers, which is a constant independent of a and b. If a and b are actually equal, then they're equal mod p too, but sometimes I'll get the answer wrong by accident because of the algebraic relationship between a and b. – Stella Biderman Feb 14 '18 at 07:32
  • 1
    However, I can do this many times with different randomly generated prime numbers. I say that a != b if I found a prime such that a != b mod p. This is guaranteed to be correct, when I output it. When I don't find a disproof, I guess that a = b. This will sometimes be wrong. However, it turns out that the rate at which the error goes to 0 as the number of primes I guess increases is exceptionally fast. – Stella Biderman Feb 14 '18 at 07:34
  • That's all really interesting. I once heard a story of people guessing the number of objects in a glass jar, and the average of all the guesses was very close to the actual number. This might be another example of what you're talking about. – dan dan Mar 31 '18 at 13:27