On the other end, for the face recognition component to its multi-biometric system, the San Francisco Police Department required that bidding companies:
- Meet specific target accuracy levels—an error rate of 1% or better;
- Provide copies of the results from all prior accuracy tests conducted by NIST in which their algorithm was evaluated;
- Upon acceptance, submit to verification tests to ensure the system “achieves the same or better accuracies than what has been achieved by relevant NIST and/or other independent and authoritative 3rd party testing;” and
- Submit to regular future accuracy testing “to reconfirm system performance and detect any degradation.”
South Sound 911 also considered accuracy in its request for face recognition proposals, requiring that: “The search results must meet a match rate of a 96% confidence rating,” and “[t]he system must have high threshold facial recognition search capability for both in-car and booking officer queries.”
b. Few agencies used trained human reviewers to bolster accuracy.
Since face recognition accuracy remains far from perfect, experts agree that a human must double-check the results of face recognition searches to ensure that they are correct. As the architect of a leading face recognition algorithm put it, “I wouldn’t like my algorithm to take someone to jail as a single source” of identifying evidence.
Simple human review of results is not enough, however. Without specialized training, human reviewers make so many mistakes that overall face recognition accuracy could actually drop when their input is taken into account. Humans instinctively match faces using a number of psychological heuristics that can become liabilities for police deployments of face recognition. For example, studies show that humans are better at recognizing people they already know and people of the same race.
As evidence of the benefits of training, one study tested the performance of Australian passport personnel, who use Cognitec’s algorithm to check for duplicate passport applications. Facial reviewers, who receive limited instruction in face matching, identified the correct match or correctly concluded there was no match only half the time; they did no better than college students. Specially trained facial examiners, however, did about 20% better.
Unfortunately, while other agencies may do this training, documents we received identified only eight systems that employed human gatekeepers to systematically review matches before forwarding them to officers: the FBI face recognition unit (FACE Services), the Albuquerque Police Department, the Honolulu Police Department, the Maricopa County Sheriff’s Office, the Michigan State Police, the Palm Beach County Sheriff’s Office, the Seattle Police Department, and the West Virginia Intelligence Fusion Center.
Even these systems are still not ideal. For all but two of these systems—the FBI face recognition unit and the Michigan State Police—the level of training required for these human gatekeepers is unclear. Some searches evade review altogether. When a Michigan State Police officer conducts a face recognition search from a mobile phone (such as for a field identification during a traffic stop), the algorithms’ results are forwarded directly to the officer without any human review. Similarly, while the FBI subjects its own searches of its database to trained human review, states requesting FBI searches of that same database are returned up to 50 candidate images without any kind of human review.
c. Human reviewer training regimes are still in their infancy.
Agencies that are eager to implement human training may encounter yet another difficulty: the techniques for manually comparing photos of faces for similarity—techniques that would inform this sort of training—are still in their infancy. The FBI’s Facial Identification Scientific Working Group (FISWG), whose members include academic institutions and law enforcement agencies at all levels of government, has developed training and standardization materials for human facial comparison.