This is an Annotation for Transparent Inquiry (ATI) data project.
The annotated article can be viewed on the Publisher's Website.
Data Generation
The research project engages a story about perceptions of fairness in criminal justice decisions. The specific focus involves a debate between ProPublica, a news organization, and Northpointe, the owner of a popular risk tool called COMPAS. ProPublica wrote that COMPAS was racist against blacks, while Northpointe posted online a reply rejecting such a finding. These two documents were the obvious foci of the qualitative analysis because of the further media attention they attracted, the confusion their competing conclusions caused readers, and the power both companies wield in public circles. There were no barriers to retrieval as both documents have been publicly available on their corporate websites. This public access was one of the motivators for choosing them as it meant that they were also easily attainable by the general public, thus extending the documents’ reach and impact. Additional materials from ProPublica relating to the main debate were also freely downloadable from its website and a third party, open source platform. Access to secondary source materials comprising additional writings from Northpointe representatives that could assist in understanding Northpointe’s main document, though, was more limited. Because of a claim of trade secrets on its tool and the underlying algorithm, it was more difficult to reach Northpointe’s other reports. Nonetheless, largely because its clients are governmental bodies with transparency and accountability obligations, some of Northpointe-associated reports were retrievable from third parties who had obtained them, largely through Freedom of Information Act queries. Together, the primary and (retrievable) secondary sources allowed for a triangulation of themes, arguments, and conclusions.
The quantitative component uses a dataset of over 7,000 individuals with information that was collected and compiled by ProPublica and made available to the public on github. ProPublica’s gathering the data directly from criminal justice officials via Freedom of Information Act requests rendered the dataset in the public domain, and thus no confidentiality issues are present. The dataset was loaded into SPSS v. 25 for data analysis.
Data Analysis
The qualitative enquiry used critical discourse analysis, which investigates ways in which parties in their communications attempt to create, legitimate, rationalize, and control mutual understandings of important issues. Each of the two main discourse documents was parsed on its own merit. Yet the project was also intertextual in studying how the discourses correspond with each other and to other relevant writings by the same authors. Several more specific types of discursive strategies were of interest in attracting further critical examination:
- Testing claims and rationalizations that appear to serve the speaker’s self-interest
- Examining conclusions and determining whether sufficient evidence supported them
- Revealing contradictions and/or inconsistencies within the same text and intertextually
- Assessing strategies underlying justifications and rationalizations used to promote a party’s assertions and arguments
- Noticing strategic deployment of lexical phrasings, syntax, and rhetoric
- Judging sincerity of voice and the objective consideration of alternative perspectives
Of equal importance in a critical discourse analysis is consideration of what is not addressed, that is to uncover facts and/or topics missing from the communication. For this project, this included parsing issues that were either briefly mentioned and then neglected, asserted yet the significance left unstated, or not suggested at all. This task required understanding common practices in the algorithmic data science literature. The paper could have been completed with just the critical discourse analysis. However, because one of the salient findings from it highlighted that the discourses overlooked numerous definitions of algorithmic fairness, the call to fill this gap seemed obvious. Then, the availability of the same dataset used by the parties in conflict, made this opportunity more appealing. Calculating additional algorithmic equity equations would not thereby be troubled by irregularities because of diverse sample sets. New variables were created as relevant to calculate algorithmic fairness equations. In addition to using various SPSS Analyze functions (e.g., regression, crosstabs, means), online statistical calculators were useful to compute z-test comparisons of proportions and t-test comparisons of means.
Logic of Annotation
Annotations were employed to fulfil a variety of functions, including supplementing the main text with context, observations, counter-points, analysis, and source attributions. These fall under a few categories.
Space considerations. Critical discourse analysis offers a rich method for studying speech and text. The discourse analyst wishes not simply to describe, but to critically assess, explain, and offer insights about the underlying discourses. In practice, this often means the researcher generates far more material than can comfortably be included in the final paper. As a result, many draft passages, evaluations, and issues typically need to be excised. Annotation offered opportunities to incorporate dozens of findings, explanations, and supporting materials that otherwise would have been redacted. Readers wishing to learn more than within the four corners of the official, published article can review these supplementary offerings through the links.
Visuals. The annotations use multiple data sources to provide visuals to explain, illuminate, or otherwise contextualize particular points in the main body of the paper and/or in the analytic notes. For example, a conclusion that the tool was not calibrated the same for blacks and whites could be better understand with reference to a graph to observe the differences in the range of risk scores comparing these two groups. Overall, the visuals deployed here include graphs, screenshots, page extracts, diagrams, and statistical software output.
Context. The data for the qualitative segment involved long discourses. Thus, annotations were employed to embed longer portions of quotations from the source material than was justified in the main text. This allows the reader to confirm whether quotations were taken in proper context, and thus hold the author accountable for potential errors in this regard.
Sources. Annotations incorporated extra source materials, along with quotations from them to aid the discussion. Sources that carried some indication that they may not be permanently available in the same form and in available formats were more likely to be archived and activated. This practice helps ensure that readers continue to have access to third party materials as relied upon in the research for transparency and authentication purposes.