Big Data and Criminal Justice

The debate about the criminal justice system increasingly is driven by empirical studies. Phil Dixon wrote thoughtfully last week about a new analysis of 700,000 drug arrests conducted by UNC faculty members outside the School of Government. This article by a Georgia law professor is also attracting attention – it claims to be “the most substantial empirical analysis of misdemeanor case processing to date,” based on “multiple court-record datasets, covering several million cases across eight diverse jurisdictions.” Similarly, in the popular media, this Washington Post article analyzes a huge trove of data to determine the percentage of arrests in each county across the nation that are based on marijuana possession.

I could list many more examples, but the general point is one with which I suspect most readers will agree: that big data is revolutionizing the discussion of criminal justice. This transformation has been unfolding for decades. Drivers include the growth of law and economics and other law and social science approaches, which has fertilized the legal field with social science techniques, and the increasing availability of large datasets, which has made statistical analysis easier. This post offers a few thoughts about the costs and benefits of this new data-focused world.

North Carolina may be headed further down the road of big data. Several Republican legislators, including those with ties to law enforcement and the courts, recently filed H 885. The bill would require several departments of state government to “conduct a statewide study to identify the criminal justice data elements currently collected and maintained by jails, courts, and prisons” in order to “identify gaps in data” and find “solutions for improving availability and accessibility of data to inform public policy.” If it passes, more empirical analyses will surely result.

That’s a good thing. Data analysis can help us break free of our biases, questionable intuitions, and preconceptions. For example, because women experience a variety of disadvantages in the workplace and in society at large, some might expect that women would also be disadvantaged in the criminal justice system. However, the statistical evidence reveals that women are treated more favorably than men at sentencing and otherwise, as noted here and here.

Well-designed empirical research has illuminated or may illuminate other issues, like what has driven rising incarceration rates, why crime rates have declined, and whether prosecutors, police officers, judicial officials and other system actors exhibit implicit bias against various racial groups. This information should be welcome even when it is challenging or unflattering. Just as we expect our physicians to prescribe evidence-based treatment for our maladies, it is reasonable to diagnose and treat the criminal justice system based on the empirical facts.

But we should be skeptical consumers of data. Doing good empirical research is difficult, time-consuming, and sometimes expensive. So researchers may do smaller studies and hype the results. Does chocolate cure Alzheimer’s? Probably not, even though a small and preliminary study suggested possible benefits. Even the larger studies published typically published in leading medical journals often suffer from various flaws and limitations, according to Harvard and UCLA researchers. And many studies that researchers have repeated have yielded different results, leading to the so-called “replication crisis” in the social sciences.

It can be hard for those of us without extensive backgrounds in research design and statistics to sort the good studies from the bad. That’s especially important because the statistical arms race may lead advocates for various policy positions to throw together outcome-driven “studies” in order to tout the evidence-based nature of their preferred policies. So at a minimum, we should be cautious consumers of empirical analyses, asking questions like: Who conducted this study? How large is the sample? If it is a survey, could the wording of the questions have influenced the results? Have the results been replicated by other researchers?

Use in the system vs. use to study the system. This post is focused on the use of statistical and empirical evidence to study, analyze, and reform the criminal justice system. But it is worth noting that big data is also being used with increasing frequency within the criminal justice system. For example, police resources may be allocated based on algorithms regarding where crimes are likely to occur; judicial officials may consult risk assessment tools in setting conditions of pretrial release; and probation departments may use statistically-supported instruments to determine how to supervise offenders. All of these implementations are controversial, with critics generally arguing that using big data does not increase the accuracy of decision-making and may even reinforce existing biases.

Conclusion. Big data is here to stay, and using data skeptically and appropriately is something that all architects of, and participants in, the criminal justice system need to learn to do. As Mark Twain famously remarked, “facts are stubborn things, but statistics are pliable.”

4 thoughts on “Big Data and Criminal Justice”

  1. “Numbers are like hookers. Once you get them on the sheets you can make them say and do whatever you want.”

    – My statistics professor

  2. IF the AOC is going to keep statistics and sell them to the media and researchers they should at least be accurate. Careers might depend on them.
    But this will not be the case as long the State clings to a case based system instead of a defendant based system.
    The folliwing is a classic an example of how the AOC data system failed to reflect reality.
    While I was still DA, Forsyth indicted a father on 9 charges for rape and sexual offenses commuted against his 3 daughters. The crimes occurred over many years. The mother and children one by one over time eventually
    “escaped” from the man and eventually all reunited in Atlanta. There the children revealed to school officials. Law enforcement investigated and the man was arrested.
    For the trial we had to rent a van to bring the 3 victems and mother to the trial and to put them up in a hotel for the week. A fight broke out between the family of the mother and the family of the father during the trial. Our tenacious sex offense prosecutor, Pansy Glanton got hit by someone during the courtroom melee but kept on truckin”.
    When the verdict came in the first 6 charges were not guilty. I was really stressed. But the last 3 verdicts were guilty. Relief. The Court sentenced the defendant to life plus 10 plus 10 consecutively. All in all a tremendous victory for Pansy on a logistical and legally difficult case.
    I wondered how inaccurately the AOC would score the matter. When our quarterly numbers came out it showed we had 9 sex offense trials for the quarter. Of those 6 ended in verdicts of not guilty and 3 in verdicts of guilty. These were all for one defendant who will die in prison.
    Now you tell me how valid is this AOC case based database to any researcher ? This guy was held accountable because Pansy did a fabulous job. But the AOC shows us losing 2/3 rds of our sex offense trials for a three month period.
    How’d you like to run for reelection on that platform ?

    • Hear! Hear!

      The summary data from AC/IS is suspect, since some LEAs have figured out that they can look more productive by charging single offenses across multiple warrants, generating multiple CR numbers, and thus looking more productive.

      You also have an issue with data integrity, as charge codes are frequently not updated to reflect the actual offense charged/indicted/convicted, or are incorrectly entered to begin with. There is also an issue with exactly how each charge code is defined. While I presume that an attorney in AOC’s General Counsel’s office is the one actually determining what is a crime, the requirement that there be multiple codes for what is essentially the same crime (felony larceny) does not make sense.

      Also, the mess that AOC makes with Hispanic / Latino surnames makes it almost impossible to reliably match up offenders using just name and dob.

      Basically, AOC needs to rebuild the system *from the ground up*, and come up with a more consistent data model. They also need to focus more training for both officers and magistrates to emphasize the importance of not only correct data entry, but that it is worth the extra few seconds to double check and see if the offender is already in the system.

      You know the most telling sign of the outdated thinking in AOC’s TSD? The fact that they still use 10 point Courier for their interface font.

  3. “Also, the mess that AOC makes with Hispanic / Latino surnames makes it almost impossible to reliably match up offenders using just name and dob.”

    That’s an astute observation of an issue that extends well beyond surnames and AOC analytics. One solution would be to transition to a system that is biometrically rather than biographically dependent. That would require amending 15A-502, which could actually solve multiple underappreciated problems. The discretionary gap created by the current statute has significant ripple effects for NC public safety.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.