1

I came across a video with the title "you can buy things with your face in China". in the video, a woman scanned her face into a vending machine to buy a drink with only her face and without anything else. The description claimed that each face is a digital ID linked to China's CBDC. (link to the video)

But I can't understand how this is possible from a technical viewpoint. For this, the model should classify hundreds of thousands of faces and this doesn't sound feasible to me. The ImageNet dataset has only 1000 classes and the Google JFT dataset has only 30K labels!

Is it feasible to perform facial recognition on hundreds of thousands of individuals?

Peyman
  • 534
  • 3
  • 10
  • What makes you think there is an AI with hundreds of thousands classes behind this technology? finger print detection seems to work to a pretty good extend, why shouldn't face recognition? – N. Kiefer Apr 18 '23 at 06:35

1 Answers1

1

Whilst identities can be viewed strictly as a classification problem, typically at scale they are solved as a regression problem to a description vector, followed by lookup of nearest vectors outside of the AI (e.g. with a database search optimised to find nearest multi-dimensional vectors). To help with this approach, the regression system is trained to minimise distance between pictures of objects with the same identity and maximimse distance between pictures of objects with different identity.

Here's a reference to the triplet loss training approach used to do this.

In brief, the training database requires at least 2 images of each subject. Pairs of images are selected, some identical, some different, and with a bias towards scoring "difficult" pairs. Each image is converted to a description vector by the neural network, and the difference between pairs is used as the loss function (so unlike many supervised learning schemes, the loss is not per function call, but calculated over sets of three function calls).

Prior to use of triplet loss function, it was quite common to have the neural networks learn to predict actual biometrics (e.g. distance between eyes) from images. This had the advantage that the description vector was meaningful, but the disadvantages of needing a lot of difficult data collection, and no guarantee that the chosen biometrics would be good at separating identities in practice, depending on the population.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60