E. Racial Bias

“Q: Is the Booking Photo Comparison System biased against minorities[?]”

“A: No… it does not see race, sex, orientation or age. The software is matching distance and patterns only, not skin color, age or sex of an individual.”

- Frequently Asked Questions, Seattle Police Department

Human vision is biased: We are good at identifying members of our own race or ethnicity, and by comparison, bad at identifying almost everyone else.214 Yet many agencies using face recognition believe that machine vision is immune to human bias. In the words of one Washington police department, face recognition simply “does not see race.”215

The reality is far more complicated. Studies of racial bias in face recognition algorithms are few and far between. The research that has been done, however, suggests that these systems do, in fact, show signs of bias. The most prominent study, co-authored by an FBI expert, found that several leading algorithms performed worse on African Americans, women, and young adults than on Caucasians, men, and older people, respectively.216 In interviews, we were surprised to find that two major face recognition companies did not test their algorithms for racial bias.217

Racial bias intrinsic to an algorithm may be compounded by outside factors. African Americans are disproportionately likely to come into contact with—and be arrested by—law enforcement.218 This means that police face recognition may be overused on the segment of the population on which it underperforms. It also means that African Americans will likely be overrepresented in mug shot-based face recognition databases. Finally, when algorithms search these databases, the task of selecting a final match is often left to humans, even though this may only add human bias back into the system.

  • 214. See, e.g., Gustave A. Feingold, The Influence of Environment on Identification of Persons and Things,  5 J. of the Am. Inst. of Crim. L. & Criminology 39, 50 (May 1914-March 1915) (“Now it is well known that, other things being equal, individuals of a given race are distinguishable from each other in proportion to our familiarity, to our contact with the race as a whole.”); Luca Vizioli, Guillaume A. Rousselet, Roberto Caldara, Neural Repetition Suppression to Identity is Abolished by Other-Race Faces, 107 Proc. of the Nat’l Acad. of Sci. of the U.S., 20081, 20081 (2010), http://www.pnas.org/content/107/46/20081.abstract. This problem is known as the “other-race” effect. Id.
  • 215. See Seattle Police Department, Booking Photo Comparison System FAQs, Document p. 009377. In 2009, Scott McCallum then-systems analyst for the Pinellas County Sheriff’s Office face recognition system, made the same claim to the Tampa Bay Times. “[The software] is oblivious to things like a person’s hairstyle, gender, race or age, McCallum said.” Kameel Stanley, Face recognition technology proving effective for Pinellas deputies, Tampa Bay Times, July 17, 2009, http://www.tampabay.com/news/publicsafety/crime/facial-recognition-technology-proving-effective-for-pinellas-deputies/1019492.
  • 216. See Brendan F. Klare et al., Face Recognition Performance: Role of Demographic Information, 7 IEEE Transactions on Information Forensics and Security 1789, 1797 (2012) (hereinafter “Klare et al.”).
  • 217. See Interview with Face Recognition Company Engineer (Anonymous) (Mar. 9, 2016) (notes on file with authors); Interview with Face Recognition Company Engineer (Anonymous) (Mar. 16, 2016) (notes on file with authors).
  • 218. See, e.g., Brad Heath, Racial Gap in U.S. Arrest Rates: ‘Staggering Disparity’, USA Today, Nov. 19, 2014, http://www.usatoday.com/story/news/nation/2014/11/18/ferguson-black-arrest-rates/19043207.

1. Face recognition algorithms exhibit racial bias.

Despite the lack of extensive public and independent testing, several studies have uncovered racial bias in face recognition algorithms. In 2011, researchers used the algorithms and images from a 2006 NIST competition to compare accuracy on subjects of East Asian and Caucasian descent.219 They found that algorithms developed in East Asia performed better on East Asians, while algorithms developed in Western Europe and the U.S. performed better on Caucasians. This result suggests that algorithms may be most accurate on the populations who developed them—a concerning effect given that software engineers in the United States are predominately Caucasian males.220

The 2012 FBI-coauthored study tested three commercial algorithms on mug shots from Pinellas County, Florida.221 The companies tested include the suppliers of algorithms to the Los Angeles County Sheriff, the Maryland Department of Public Safety, the Michigan State Police, the Pennsylvania Justice Network, and the San Diego Association of Governments (SANDAG), which runs a system used by 28 law enforcement agencies within San Diego County.222

All three of the algorithms were 5 to 10% less accurate on African Americans than Caucasians. To be more precise, African Americans were less likely to be successfully identified—i.e., more likely to be falsely rejected—than other demographic groups.223 A similar decline surfaced for females as compared to males224 and younger subjects as compared to older subjects.225

In one instance, a commercial algorithm failed to identify Caucasian subjects 11% of the time but did so 19% of the time when the subject was African American—a nearly twofold increase in failures. To put this in more concrete terms, if the perpetrator of the crime were African American, the algorithm would be almost twice as likely to miss the perpetrator entirely, causing the police to lose out on a valuable lead.

Depending on how a system is configured, this effect could lead the police to misidentify the suspect and investigate the wrong person. Many systems return the top few matches for a given suspect no matter how bad the matches themselves are. If the suspect is African American rather than Caucasian, the system is more likely to erroneously fail to identify the right person, potentially causing innocent people to be bumped up the list—and possibly even investigated. Even if the suspect is simply knocked a few spots lower on the list, it means that, according to the facial recognition system, innocent people will look like better matches.

  • 219. See P. Jonathon Phillips et al., An Other-Race Effect for Face Recognition Algorithms, 8 ACM Transactions on Applied Perception 14:1, 14:5 (2011).
  • 220. See, e.g., Google Diversity, Our Workforce: Tech, https://www.google.com/diversity/ (last visited Sept. 22, 2016) (showing the 2015 tech workforce to be 81% male and 57% white); Maxine Williams, Facebook Diversity Update: Positive Hiring Trends Show Progress, Facebook (July 14, 2016), http://newsroom.fb.com/news/2016/07/facebook-diversity-update-positive-hiring-trends-show-progress/ (showing that the tech workforce is currently 83% male and 48% white—a plurality).
  • 221. See Klare et al., above note 16, at 1789.
  • 222. As of Feb. 13, 2015, there were approximately 800 registered users of TACIDS from 28 law enforcement agencies in the San Diego area. SANDAG, Board of Directors Agenda Item 2 (Feb. 13, 2015), Document p. 005699.
  • 223. See Klare et al., above note 216, at 1797. A few studies have contradicted this result. G. H. Givens et al., How Features of the Human Face Affect Recognition: A statistical comparison of three face recognition algorithm, Computer Vision and Pattern Recognition (2004) found that African American and Asian subjects were easier to recognize, but did so using primitive academic algorithms that are a decade older than those from the 2012 study. They were trained and tested on images collected for the FERET dataset in 1993-96. Patrick J. Grother, et al., Multiple-Biometric Evaluation (MBE) 2010, Report on the Evaluation of 2D Still-Image Face Recognition Algorithms, NIST Interagency Report 7709 at 55-56, National Institute of Standards and Technology (Aug. 24, 2011), http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=905968 also found that “blacks were easier to recognize than whites for 5 of the 6 algorithms” tested in the study, three of which were the same commercial algorithms as those tested by Klare et al. However, the MBE 2010 study provides only a single graph and a paragraph of analysis to support this finding. We rely on the analysis by Klare et al., which was more systematic, comprehensive, and thorough in the way it presented its findings.
  • 224. See Klare et al., above note 216, at 1797. This finding is also supported by P. Jonathon Phillips, et al., Face Recognition Vendor Test 2002: Evaluation Report (Mar. 2003) at 26–28, http://www.face-rec.org/vendors/FRVT_2002_Evaluation_Report.pdf and Patrick J. Grother, et al., Multiple-Biometric Evaluation (MBE) 2010, Report on the Evaluation of 2D Still-Image Face Recognition Algorithms, NIST Interagency Report 7709 at 51, National Institute of Standards and Technology (Aug. 24, 2011), http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=905968.
  • 225. See Klare et al., Face Recognition Performance: Role of Demographic Information, 7 IEEE Transactions on Information Forensics and Security 1790, 1798 (2012). This finding is also supported by Phillips, et. al., Face Recognition Vendor Test 2002: Evaluation Report (Mar. 2003) at 29, http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=50767. This result is contradicted by Patrick J. Grother, et. al., Multiple-Biometric Evaluation, Report on the Evaluation of 2D Stll-Image Face Recognition Algorithms, NIST Interagency Report 7709 at 51-52, National Institute of Standards and Technology (Aug. 24, 2011), http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=905968 which found no prevailing effect.

5-10%

Lower accuracy rates for African Americans and women, as measured in an FBI co-authored 2012 study.

There are various explanations for this bias. The simplest is that training is destiny; the faces that an algorithm practices on are the faces it will be best at recognizing. When those faces disproportionately represent one race, an algorithm will optimize its accuracy for that group at the expense of others. Notably, in addition to testing three commercial algorithms, the 2012 study also tested an academic algorithm that was trained three separate times exclusively on Caucasians, African Americans, and Latinos; it consistently performed best on the race on which it was trained.226

The authors of the 2012 study suggest another contributing factor: Some demographics may be inherently more difficult to recognize than others. For example, they hypothesize that cosmetics could make it harder to match photos of women.227 Several technologists we spoke to mentioned that photos of people with darker skin tend to have less color contrast, making it harder to extract the features that algorithms use to compare faces. 228

Finally, bias may be the inadvertent result of intentional design decisions. Engineers designing an algorithm may purposefully design it to perform on certain demographics, potentially at the expense of others.

  • 226. See Klare et al., Face Recognition Performance: Role of Demographic Information, 7 IEEE Transactions on Information Forensics and Security 1790, 1800 (2012) (“Face recognition performance on race/ethnicity…generally improves when training exclusively on that same cohort.”).
  • 227. See Klare et al., Face Recognition Performance: Role of Demographic Information, 7 IEEE Transactions on Information Forensics and Security 1790, 1797 (2012) (“These results strongly suggest that the female cohort is inherently more difficult to recognize.”)
  • 228. Interview with Face Recognition Company Engineer (Anonymous) (Mar. 9, 2016) (“when you have people with very dark skin, you have a lower dynamic range, which means that it’s much harder to capture high-quality images. . .  This is one reason why the performance on black subjects has typically been worse”) (notes on file with authors).
Pennsylvania Justice Network, “JNET Facial Recognition User Guide Version 1.8” (Dec. 4, 2014)
Figure 11Pennsylvania Justice Network, “JNET Facial Recognition User Guide Version 1.8” (Dec. 4, 2014)

As an example of a design choice that may result in bias, a 2014 handbook for users of the Pennsylvania Justice Network (JNET) face recognition system instructs users on how to generate a three-dimensional model of a face by using software from a company called Animetrics. In order to generate the model, users are instructed to enter the race or ethnicity of the subject. But as described in the handbook, the JNET system’s only options are “Generic Male, Generic Female, Asian Male, Asian Female, Caucasian Male, Caucasian Female or Middle Eastern Male.”229 As of 2015, African Americans and Latinos comprised 11.7% and 6.8% of Pennsylvanians, respectively.230 While this is only one of many tools used in Pennsylvania’s face recognition software suite, it excludes a significant portion of the state’s population—and, potentially, the communities most likely to encounter law enforcement.

  • 229. See Pennsylvania JNET, JNET Facial Recognition User Guide Version 1.8 (Dec. 4, 2014), Document p. 010879­–010883.
  • 230. See U.S. Census, Quick Facts: Pennsylvania, http://www.census.gov/quickfacts/table/PST045215/42#headnote-js-b (last accessed July 24, 2016).

2. Face recognition algorithms are not being tested for racial bias.

The scientific literature on racial bias of face recognition is sparse. The two studies discussed in this section represent some of the only lines of work to investigate this phenomenon. NIST, which has run a face recognition competition every three to four years since the mid-1990s, has tested for racial bias just once.231 The problem may be related to demand: Even jurisdictions like the San Francisco Police Department—which required prospective face recognition vendors to demonstrate a target accuracy levels, provide documentation of performance on all applicable accuracy tests, and submit to regular future accuracy tests—did not ask companies to test for racially biased error rates.232

This state of affairs is not limited to the government or academia. In the spring of 2016, we conducted interviews with two of the nation’s leading face recognition vendors for law enforcement to ask them how they identify and seek to correct racially disparate error rates. At that time, engineers at neither company could point to tests that explicitly checked for racial bias. Instead, they explained that they use diverse training data and assume that this produces unbiased algorithms.233

  • 231. See Patrick J. Grother, et. al., Multiple-Biometric Evaluation, Report on the Evaluation of 2D Stll-Image Face Recognition Algorithms, NIST Interagency Report 7709, National Institute of Standards and Technology 55-56 (Aug. 24, 2011), http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=905968.
  • 232. San Francisco Police Department, Request for Proposal—Automated Biometric Identification System, Section 02: Technical Specifications (Mar. 31, 2009), Document pp. 005555–005557.
  • 233. Interview with Face Recognition Company Engineer (Anonymous) (Mar. 9, 2016) (notes on file with authors); Interview with Face Recognition Company Engineer (Anonymous) (Mar. 16, 2016) (notes on file with authors). In order to obtain candid responses, we assured employees at these companies that their answers would be reported anonymously. A third company declined to be interviewed without a non-disclosure agreement that would prohibit publication of their responses.
Engineers at two of the nation’s leading face recognition companies indicated that they did not explicitly test their systems for racial bias.

This problem may trace, in part, to a lack of diverse photo datasets that could be used to test for racially biased errors. The 2011 study, for example, was likely tested only on Caucasians and East Asians because its database was composed of photos of undergraduate volunteers who were 77% Caucasian, 14% Asian, and 9% “other or unknown.”234 Likewise, the 2012 study also tested the algorithms on Hispanics, but the results were erratic due to “the insufficient number of training samples available.”235 This situation is the norm in face recognition—diverse collections of photos that accurately capture communities of interest to law enforcement are in short supply. This deficiency reduces the reliability of testing regimes for existing systems and makes it more difficult to train new algorithms.

  • 234. Flynn et al., Lessons from Collecting a Million Biometric Samples, University of Notre Dame/National Institute of Standards and Technology, https://www3.nd.edu/~kwb/Flynn_Phillips_Bowyer_FG_2015.pdf.
  • 235. See Klare et al., Face Recognition Performance: Role of Demographic Information, 7 IEEE Transactions on Information Forensics and Security 1790, 1798.

3. African Americans are disproportionately likely to be subject to police face recognition.

A face recognition system can only “find” people who are in its database; in systems that rely on mug shot databases, racial disparities in arrest rates will make African Americans much more “findable” than others—even though those identifications may themselves be more likely to be erroneous.

Ratio of African American arrest rates to population share in select jurisdictions

3:1
Arizona
2:1
Hawaii
3:1
L.A. County
2:1
Michigan
5:1
Minnesota
3:1
Pennsylvania
3:1
San Diego County
2:1
Virginia

Sources: U.S. Census, Minnesota Department of Public Safety, King County Department of Adult and Juvenile Detention, Pennsylvania Uniform Crime Reporting System, State of California Department of Justice Office of the Attorney General, Virginia State Police, Arizona Department of Public Safety236

Many agencies that reported using mug shot databases (alone or in conjunction with driver’s license and ID photos) experience dramatic racial disparities in arrest rates. For example, in 2014, African Americans represented 5.4% of Minnesota’s population but 24.5% of those arrested. In contrast, Caucasians were 82.1% of the population but 57.0% of those arrested.237 A Center on Juvenile and Criminal Justice fact sheet notes that “African American women, 5.8 percent of San Francisco’s total female population, constituted 45.5 percent of all female arrests in 2013.”238

These statistics do not just speak to arrests. They reflect the fact that African Americans are not just more likely to be arrested: they are also more likely to be stopped, interrogated, or otherwise investigated by law enforcement. Police face recognition systems do not only perform worse on African Americans; African Americans also more likely to be enrolled in those systems and be subject to their processing.

A natural response to the enrollment problem might be to move away from mug shot databases and instead use driver’s license and ID photo databases, which may better reflect the overall population in a jurisdiction. As this report explains, however, this results in the creation of a dragnet biometric database of law-abiding citizens—a shift that is unprecedented in the history of federal law enforcement and raises profound privacy issues. Face recognition presents some problems for which there are no easy answers.

  • 236. All arrest ratios have been rounded to the nearest whole number. Arizona Department of Public Safety, Crime in Arizona (2014), http://www.azdps.gov/about/reports/docs/crime_in_arizona_report_2014.pdf (11.34% of adult arrests were of African Americans in Arizona); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Arizona, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US04 (African Americans comprised 4.2% of the population of Arizona); California Department of Justice, Office of the Attorney General, CJSC Statistics: Arrests, https://oag.ca.gov/crime/cjsc/stats/arrests (last visited Sept. 22, 2016) (22.94% of arrests in Los Angeles were of African Americans); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Los Angeles, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0500000US06037 (African Americans comprised 8.34% of the population of Los Angeles); Minnesota Department of Public Safety, Uniform Crime Report (2014), https://dps.mn.gov/divisions/bca/bca-divisions/mnjis/Documents/2014-MN-Crime-Book.pdf (24.50% of arrests were of African Americans in Minnesota); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Minnesota, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US277 (African Americans comprised 5.4% of the population of Minnesota); Pennsylvania Uniform Crime Reporting System, Crime in Pennsylvania: Annual Uniform Crime Report (2014), http://www.paucrs.pa.gov/UCR/Reporting/Annual/AnnualFrames.asp?year=2014 (31.8% of arrests in Pennsylvania were of African Americans); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Pennsylvania, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US42 (African Americans comprised 10.4% of the population of Pennsylvania); California Department of Justice, Office of the Attorney General, CJSC Statistics: Arrests, https://oag.ca.gov/crime/cjsc/stats/arrests (last visited Sept. 22, 2016) (15.12% of those arrested in San Diego County were of African Americans), U.S. Census Bureau,  2010-2014 American Community Survey 5-Year Estimates San Diego County, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0500000US06073 (African Americans comprised 5.0% of the population of San Diego County); Uniform Crime Reporting Section, Department of State Police, Crime in Virginia (2014), http://www.vsp.state.va.us/downloads/Crime_in_Virginia/Crime_in_Virginia_2014.pdf (44.73% of those arrested in Virginia were African American);  U.S. Census Bureau,  2010-2014 American Community Survey 5-Year Estimates Virginia, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US51 (African Americans comprised 19.3% of the population of Virginia); Michigan State Police, Michigan Incident Crime Reporting (2014), http://www.michigan.gov/documents/msp/Annual_StatewideArrests_493231_7.pdf (33% of arrests were of African Americans in Michigan); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Michigan, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US26 (African Americans comprised 14.0% of the population of Michigan); Hawaii Crime Prevention & Justice Assistance Division, Crime in Hawaii (2014), https://ag.hawaii.gov/cpja/files/2016/07/Crime-in-Hawaii-2014.pdf (4% of arrests were of African Americans in Hawaii); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Hawaii, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US15 (African Americans comprised 1.9% of the population of Hawaii).
  • 237. Minnesota Department of Public Safety, Uniform Crime Report (2014), https://dps.mn.gov/divisions/bca/bca-divisions/mnjis/Documents/2014-MN-Crime-Book.pdf (24.50% of arrests were of African Americans in Minnesota); U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates Minnesota, https://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_5YR/DP05/0400000US277 (African Americans comprised 5.4% of the population of Minnesota).
  • 238. Michael Males, San Francisco’s Disproportionate Arrest of African American Women Persists (Apr. 2015) at 1, http://www.cjcj.org/uploads/cjcj/documents/disproportionate_arrests_in_san_francisco.pdf.

Sidebar 8: Scoring Protections Against Racial Bias

There was too little information available to score individual agencies on their efforts to combat racial bias in their face recognition system. The main factor in this decision was the absence of regular accuracy tests for racially biased error rates. (Many jurisdictions also failed to disaggregate arrest rates along the lines of race and ethnicity.) If NIST institutes regular accuracy tests for racial bias, however, police departments and the communities they serve should condition system purchases on an algorithm’s performance in bias tests.