Face Recognition: it Would Be Nice to Just Say ‘No’

Artificial Intelligence has always been a catch all term, evolving with context and history—Above all, AI must be magical, so when yesterday’s “intelligent system” is today’s everyday commodity, it’s no longer AI. These days, AI tends to mean some combination of data analysis and machine learning, and these technologies are being deployed everywhere [4].

This spring, Paul Marks discusses one of the most controversial applications, face recognition [1].  In particular, machine learning is used to rapidly match a video image of a person to a large database of images, returning a putative identification.  Technology has advanced to the point that results can be returned in seconds from enormous datasets.  Combined with ubiquitous real time video from many sources, this could be a powerful surveillance tool.

If it really works.

And that’s one of the problems. 

As we know, machine learning is iffy, and, indeed, the results are often literally laughable.  That’s fine when you are messing around like Shane is, but it’s a whole lot less OK when police arrest and prosecute the wrong guy.

Part of the problem is that it is difficult to assess the accuracy of a machine learning system.  It is even more difficult to predict how it will perform on real world tasks.

However, there is now strong evidence that facial recognition systems are often highly “biased”, in that they do not work equally well for all kinds of pictures.  In the real world, this means that some people, and some groups of people, will be less accurately identified.  No points awarded for guessing that recognition is less accurate for people less like the developers themselves. 

This is a gigantic problem, especially when police or other authorities act on the basis of compture identifications.  As Marks reports, there are well documented cases of serious errors.  And there is strong evidence that many face recognition systems are highly inaccurate for dark skinned people, especially females.  Which means that relying on the software results in unfair decision making.

Now, it is difficult to understand why a machine learning system gives its answers.  But one huge factor is the training sets used to learn.  Machine learning algorithms learn to recognize what they are taught to recognize.  If the training set is incomplete or unrepresentative or inaccurate, then the machine will faithfully learn the wrong things. (See Janelle Shane has extensively explored the laughable results of this issue.)

So much depends on the training data.  Where does training data come from?  Often, it comes from datasets amassed from the internet or other sources.  Well, that should be fine.  Databases and the internet never have errors.

What kinds of errors?  Well, aside from unrepresentative sampling, images are not necessarily correctly identified.  That means that the machine learning is carefully learning the wrong answers some of the time.  The first law of computer science is, Garbage In, Garbage Out.

This spring, researchers at MIT report that they find large numbers of errors in datasets that are used to develop machine learning systems [2,3]  These datasets are large collections of images and text which have been labeled to indicate what is in the image, what the text means, and so on.

Their study showed that these dataset are rife with errors, or at least disputable labels.   While some datasets had error rates under 1%, the average was around 3%. 

I’m not sure what to make of these findings.  It is hard to say what impact such discrepancies would have. If these datasets are used for training, they will create inaccurate ML systems. Sometimes these datasets are used to assess the accuracy of the learning, which means that the resulting accuracy could be off by quite a bit.  If they are used for both training and assessment, then they will over estimate the correctness of the ML.

Part of the point is that when we really don’t know how accurate the training dataset is, it’s hard to know what the ML is doing, even in the development and benchmarking.  They need to be assessed in real world situations. 

Preferably, face recognition should be shown to be accurate before they are used to arrest people or reject applicants for jobs or loans.

In the case of face recognition, major companies have pulled back from offering real time identification systems for police work.   However, other companies are still selling this technology, though we don’t know that it works any better.  We can be sure that authoritarian governments and organizations can and will deploy this software as long as it meets their needs.


Marks suggests that face recognition technology should not be used at this time because the benefits (if any) are outweighed by the flaws and uncertainty.

I wish it were this easy.  But machine learning and face recognition specifically are going to be used.  The question is how.

For me, the main lesson is that authorities should take the “magic” of face recognition with a significant grain of salt.  Like any investigative source, results from ML should be cross checked and validated as a matter of course. 

This evaluate should assume that face recognition might work for some purposes and in some situations, and not work well in others.  So it is extremely improtant not to accept blanket statements about accuraccy or effectiveness.  The best approach would be to test the system in the actual field conditions.  In the case of, say, policing, this means systematically comparing the machine with competent humans and other sources of information.  This is hard work, but it is the only way to build confidence that the machine is doing all and only what we mean it to.

And, by the way, it is a good idea to have realistic goals for machine learning.  Expecting magical levels of accuracy and speed, or expecting to replace human investigators with algorithms are impossible goals.  Using machine learning to augment, accelerate, and cross check other methods seems possible.


  1. Paul Marks, Can the biases in facial recognition be fixed; also, should they? Communications of the ACM, 64 (3):20–22,  2021. https://doi.org/10.1145/3446877
  2. Curtis G. Northcutt, Pervasive Label Errors in ML Datasets Destabilize Benchmarks, in The L7 Newsletter, March 29, 2021. https://l7.curtisnorthcutt.com/label-errors
  3. Curtis G. Northcutt, Anish Athalye, and Jonas Mueller, Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. arXiv  arXiv:2103.14749, 2021. https://arxiv.org/abs/2103.14749
  4. August Reed, Atypical AI, in Science Node, March 31, 2021. https://sciencenode.org/feature/Atypical%20AI.php
  5. Kyle Wiggers, MIT study finds ‘systematic’ labeling errors in popular AI benchmark datasets, in Venture Beat, March 28, 2021. https://venturebeat.com/2021/03/28/mit-study-finds-systematic-labeling-errors-in-popular-ai-benchmark-datasets/

One thought on “Face Recognition: it Would Be Nice to Just Say ‘No’”

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.