Emancipatory Data Science - A Critical Quantitative Framework for Data Science
By: Dr. Thema Monroe-White and Dr. Fay Cobb Payton
I t is one thing to recognize that data science is mostly white and/or Asian and male (Duranton et al., 2020; Harnham, 2019), and it is quite another to 1) call attention to the problem as an extension of a much broader pattern of exclusion in the tech industry, 2) propose framing this problem as one of race and gender, and 3) recognize that those who are excluded (e.g., minoritized and marginalized people of color). Those excluded by virtue of their demographic identities and lived experiences hold solutions to the pervasive socio-technical problems that the discipline by extension continues to perpetuate.
The goal of this article is to shed light on the challenges and opportunities in data work and to highlight the unique contributions of individuals who have used their expertise to mitigate data harms for minoritized peoples. By highlighting these works, we contend that there is a more informed way forward for the data science, innovation, and research communities… what we call “emancipatory data science.”
Data science leverages statistical and machine learning approaches to create generalizable knowledge (Dhar, 2013). Data scientists collect and utilize data from, by and about people and society. Therefore, we agree with Oberski (2020) that all data are people. The breadth and depth of data harms are real. These harms are preliminarily defined as the adverse effects caused by uses of data that may impair, injure, or impede a person’s, entity’s, or society’s interests caused by status-quo, majoritarian data science and logics illustrate a bias towards minoritized people of color (Redden et al. 2020). Harm continues to emerge across industries and disciplines, and contemporary examples document and demonstrate these points. For instance,
· Facial recognition: Buolamwini and Gebru (2018) exposed data harms caused by facial recognition software that disproportionately misclassified darker skinned females compared to lighter skinned males. The impact of these systems is wide reaching, especially when used in concert with massive law-enforcement databases with images of over 117 million US adults (>50% of the population) for police and government surveillance. These systems have resulted in false identifications of innocent suspects (Garvie et al., 2019).
· Search engine algorithms: Bias in search engine traffic is well-documented (Fortunato et al. 2006). Noble found that terms like “Black girls” and “Black women” yielded pornographic and profane first page search results in Google searches (Noble 2018).
· Recidivism risk models: ProPublica investigative journalists found that Black defendants when subjected to the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) tool were twice as likely to be misclassified as higher risk as compared with white defendants. These misclassifications created a pernicious feedback loop that directly perpetuated the continual marginalization and oppression of Black people in the US (Angwin et al., 2019; Dressel & Farid, 2018; Flores et al., 2016; O’Neil, 2016).
· Bias in Hiring Algorithms: Yarger, Payton and Neupane (2019) examined AI hiring applications and their impacts on underrepresented IT talent. They noted several areas where bias in the talent acquisition process intersected with socio-cultural notions of equity. While societal systems that perpetuate inequities persist, a lack of diversity in the data talent building AI systems is a notable limitation. Design justice, an ethical approach centering the sociotechnical perspective, can improve the design and audit of AI systems.
In addition to the data harms described above, the literature reveals the following examples:
· Health care algorithms which reduce the number of Black patients identified for additional care by greater than 50% as compared to white patients with similar conditions (Benjamin, 2019; Obermeyer et al., 2019; Payton, 2021).
· Predictive policing and smart cities approaches which led to reinforcement of racist policing despite ineffective broken windows policies (Harcourt & Ludwig, 2006; Kitchin et al., 2019; Munn, 2018; Van Zoonen, 2016;).
· U.S. credit markets which disproportionately negatively impact communities of color (Gillis & Spiess, 2019).
· Bias in education where algorithms assign grades favoring students from private schools, which penalizes students from areas with a lower socioeconomic status (Adams & McIntyre, 2020; Smith, 2020).
In addition to the contemporary examples above, there is a historical precedent for using data to improve the condition of minoritized people in the US context. In 1895, Ida B Wells, the staunch anti-lynching advocate, leveraged publicly available data in the Chicago Tribune. These data included the lynching victim’s name, race, gender, geographic location, and purported reason, and Wells analyzed these data to debunk normative the white supremacist rhetoric used to justify these terroristic acts against Black Americans. Other 20th century cases include, Mary Church Terrell, one of the founding members of the NAACP, who leveraged data to create an aggressive social reform platform focused on establishing childcare centers for working Black women while improving workplace conditions for these mothers. In addition, Dr. Anna Julia Cooper provided an array of statistics on Black educational institutions, including early childhood development, colleges, and universities, and teachers, ministers, and other professionals (e.g., doctors and lawyers) to make the case for the “unassisted effort of the colored people for self-development” (Lemert & Bahn, 1998, p. 169). Cooper evaluated these data to assess high mortality rates and economic disempowerment of sharecropping systems, and the limited housing opportunities among Black Americans.
Given these contemporary and historical examples (and there are numerous others), the goal of emancipating data science and associated fields (i.e., artificial intelligence, machine learning etc.) from the pervasive data harms listed above requires empowering minoritized and marginalized people via data practices that prioritize their well-being for a greater public good. This requires that the field(s) must leverage data systems to address the preceding and consequential reinforcement of systemic socio-economic barriers (race, gender, class etc.). In the cases explored above, these emancipatory data scientists not only identified biases in data, but they also attempted to change conditions and create awareness that created these biases.
Looking ahead, what should the emancipatory data scientist do? We propose the following set of questions to guide those seeking to embrace an emancipatory trajectory with their data work:
· What protections are in place to protect critical technologists and data scientists?
· What systems can we devise to fully embrace critique within data science and the scientific process?
· How do we best capture context via small data to understand algorithmic results and inform decision-making?
· How do we marriage the “data” results with historical context to inform decision-making?
Future-Ready Data Science 3.0
With technological advances, such as MetaVerse, blockchain, Web3, data and all its uses undoubtedly will endure. The need for emancipatory data scientists has and will only heighten given the ubiquitous nature of tech’s insertion into the lived experience. The Future-Ready thinking presents inquiries that warrant present attention and solutioning as articulated below:
· In what ways do data scientists imagine their work serving a justice orientation? What are the personal/individual awards and potential penalties of doing the work which may run counter to broader organizational, discipline and society objectives? Where is the accountability by the people and organizations what harm, and how is accountability enforced?
· What institutions are using emancipatory data methods and frameworks to actively train, recruit, and retain Black, Indigenous, and marginalized data scientists of color, and what can we learn from them? What are the primary sources of support needed (personal, professional, financial etc.) for emancipatory data scientists to thrive?
· How do emancipatory data scientists receive acknowledgment for their efforts? How can the field shift towards ecosystems (Payton et al., 2021(b)) that, foster creation and ownership of tech ventures by Black, Indigenous and marginalized persons?
· How can these ventures use data science to address lived experiences, uncover and address harmful conditions of marginalized groups and receive start-up funding parity?
Ultimately, answers to these questions will vary from one context to another; however, what is shared is the fact that without a data literate and data capable stakeholder base that is reflective of the diverse public that data products ultimately aim to serve, the institutions creating and/or acting on these models risk reproducing and exacerbating data harms. These models, hence, are absent of social justice, design justice and regrettably cultivate notions of algorithmic neutrality and objectivity.
While we noted the above cases exploring data science across various domains, these scholar-activists not only identified biases in data, but they also attempted to change conditions that created these biases. This is, however, not an exhaustive list. We challenge the community to identify the many different expressions of emancipatory data science efforts both past and present and identify the ways in which it is being (or can be) realized and inform current/future knowledge.
Adams, R., & McIntyre, N. (2020, August 13). England A-level downgrades hit pupils from disadvantaged areas hardest. The Guardian. https://www.theguardian.com/education/2020/aug/13/england-a-level-downgrades-hit-pupils-from-disadvantaged-areas-hardest
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing?token=TiqCeZIj4uLbXl91e3wM2PnmnWbCVOvS
Benjamin, R. (2019). Assessing risk, automating racism. Science, 366(6464), 421–422. https://doi.org/10.1126/science.aaz3873
Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html
Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12), 64–73. https://doi.org/10.1145/2500499
Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1). https://www.science.org/doi/10.1126/sciadv.aao5580
Duranton, S., Erlebach, J., Brégé, C., Danziger, J., Gallego, A., & Pauly, M. (2020, December 18). What’s Keeping Women Out of Data Science? BCG Global. https://www.bcg.com/publications/2020/what-keeps-women-out-data-science
Flores, A. W., Bechtel, K., & Lowenkamp, C. (2016). False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.” Federal Probation, 80.
Fortunato, S., Flammini, A., Menczer, F., & Vespignani, A. (2006). Topical interests and the mitigation of search engine bias. Proceedings of the National Academy of Sciences, 103(34), 12684–12689. https://doi.org/10.1073/pnas.0605525103
Garvie, C., Bedoya, A., & Frankle, J. (2016, October 18). The Perpetual Line-Up. Georgetown Law Center on Privacy & Technology. https://www.perpetuallineup.org/
Gillis, T. B., & Spiess, J. L. (n.d.). Big Data and Discrimination. The University of Chicago Law Review, 86(2), 459–488.
Harcourt, B., & Ludwig, J. (2005). Broken Windows: New Evidence from New York City and a Five-City Social Experiment. University of Chicago Law Review, 73.
Kitchin, R., Cardullo, P., & Di, F. C. (2019). Citizenship, Justice, and the Right to the Smart City. In P. Cardullo, C. Di Feliciantonio, & R. Kitchin (Eds.), The Right to the Smart City (pp. 1–24). Emerald Publishing Limited. https://doi.org/10.1108/978-1-78769-139-120191001
Lemert, C., & Bahn, E. (1998). The Voice of Anna Julia Cooper. Rowman and Littlefield.
Munn, N. (2018, June 11). This Predictive Policing Company Compares Its Software to ‘Broken Windows’ Policing. Vice. https://www.vice.com/en/article/d3k5pv/predpol-predictive-policing-broken-windows-theory-chicago-lucy-parsons
Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press. https://nyupress.org/9781479837243/algorithms-of-oppression
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
Oberski, D. (2020). Human Data Science. Patterns, 1(4), 100069. https://doi.org/10.1016/j.patter.2020.100069
O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.
Payton, F. C. (2021). Racial Bias in Health Care Artificial Intelligence. NIHCM. https://nihcm.org/publications/artificial-intelligences-racial-bias-in-health-care
Payton, F. C., Yarger, L., & Mbarika, V. (2022). Black Lives Matter: A perspective from three Black information systems scholars. Information Systems Journal, 32. https://doi.org/10.1111/isj.12342
Redden, J., Brand, J., & Terzieva, V. (2017, December 6). Data Harm Record. Data Justice Lab. https://datajusticelab.org/data-harm-record/
Smith, H. (2020). Algorithmic bias: Should students pay the price? AI & Society, 35(4), 1077–1078. https://doi.org/10.1007/s00146-020-01054-3
USA Diversity in Data and Analytics: A review of diversity within the data and analytics industry in 2019. (2019). Harnham Report. http://www.harnham.com/harnham-data-analytics-diversity-report
van Zoonen, L. (2016). Privacy concerns in smart cities. Government Information Quarterly, 33(3), 472–480. https://doi.org/10.1016/j.giq.2016.06.004
Yarger, L., Cobb Payton, F., & Neupane, B. (2019). Algorithmic equity in the hiring of underrepresented IT job candidates. Online Information Review, 44(2), 383–395. https://doi.org/10.1108/OIR-10-2018-0334