The following is a guest post by Jonathan Ben-Menachem.
Two criminal justice reform heavyweights are trading blows over a seemingly arcane subject: research methods. In a tweet, Jennifer Doleac, Executive Vice President of Criminal Justice at Arnold Ventures, accused the Vera Institute of Justice of “research malpractice” for their evaluation of New York college-in-prison programs. In a response posted on Vera’s website, President Nick Turner accused Doleac of “giving comfort to the opponents of reform.”
At first glance, the study at the core of this debate doesn’t seem controversial: Vera evaluated Manhattan DA-funded college education programs for New York prisoners and found that participants were less likely to commit a new crime after exiting prison.
Vera used a method called propensity score matching, and constructed a “control” group on the basis of prisoners’ similarity to the “treatment” group. The idea behind propensity score matching is that it approximates the group composition—across race, age, gender, conviction history, and other factors—that researchers would have seen if they randomly assigned prisoners to take classes or not take classes. Despite their acknowledgment that “differences may remain across the groups,” Vera researchers contended that “any remaining differences on unobserved variables will be small.”
Doleac didn’t buy it. She challenged a causal claim of the Vera study, that the reduced risk of committing a new crime is solely due to participation in the program. She argued that propensity score matching could not account for potentially different “motivation and focus.” In other words, the kind of people who apply for classes are different from people who don’t apply, so the difference in outcomes can’t be attributed to prison education. Doleac’s concern is rooted in selection bias, or the idea that some factor influences both the results of a program and the process by which people end up in the program.
This fight between big philanthropy and a nonprofit executive is extremely rare, and points to a broader struggle over research and politics. The Vera Institute boasts a $264 million operating budget, and while most Americans probably only began to learn about the injustices of cash bail over the past decade, Vera has been working on bail reform since the 1960s. Arnold Ventures was founded in 2010, and the organization has allocated around $400 million to criminal justice reform —some of which went to Vera.
In his statement, Turner fired back at Doleac: “propensity score matching is a widely-used, accepted, and credible research method.” In Doleac’s defense, that consensus might not exist anymore. Experts argue against the use of propensity score matching to identify causal effects without a quasi-experimental strategy (for example, studying the discontinuity before and after a policy is implemented). Although propensity score matching does have useful applications, I might have made a critique similar to Doleac if I was a peer reviewer for an academic journal.
But I’m not sure about Doleac’s claim that Vera’s study provides “no useful information,” or her broader insistence on (quasi) experimental research designs. Because “all studies on this topic use the same flawed design,” Doleac argued, “we have *no idea* whether in-prison college programming is a good investment.” This is a striking declaration that nothing outside of causal inference counts.
In a recent statement posted on Arnold’s website, Doleac claimed that the main function of descriptive research is to generate hypotheses that can be tested with “more rigorous” methods. In other words, other forms of research merely set the stage for causal inference. This is a controversial view outside of certain economics departments—it cultivates a narrow view of the social world—and it’s notable that it now appears to be the philosophy of Arnold’s criminal justice team.
This is not the first time that Doleac has sparked controversy by refusing to acknowledge research outside of economics. In 2018, Doleac and Anita Mukherjee published a working paper called “The Moral Hazard of Lifesaving Innovations: Naloxone Access, Opioid Abuse, and Crime” which claimed that naloxone distribution fails to reduce overdose deaths while also “making riskier opioid use more appealing.” In addition to measurement problems, the moral hazard frame partly relied on an urban myth—“naloxone parties,” where opioid users stockpile naloxone, an FDA approved medication designed to rapidly reverse overdose, and intentionally overdose with the knowledge that they can be revived. The final version of the study includes no references to “naloxone parties,” removes the moral hazard framing from the title, and describes the findings as “suggestive” rather than causal.
Later that year, Doleac and coauthors published a research review in Brookings citing her controversial naloxone study claiming that both naloxone and syringe exchange programs were unsupported by rigorous research. Opioid health researchers immediately demanded a retraction, pointing to heaps of prior research suggesting that these policies reduce overdose deaths (among other benefits). But Doleac released a statement to reporters alleging that public health researchers “collectively have so little understanding of rigorous research methods.” Brookings later distanced itself from this claim.
This is an iteration of what sociologist Marion Fourcade calls the self-declared “superiority of economists”: they see themselves at the top of the pecking order, and steamroll other social sciences like “colonists.” The Brookings paper almost exclusively cited economists rather than health researchers, and Doleac’s list of “criminal justice experts” is effectively a list of economists and people who use their methods.
Privileging causal inference methods in funding decisions is one thing, but publicly attacking researchers who don’t standards economists don’t find “rigorous” is another.
“Rigorous” methods have been critiqued by fellow researchers for a lack of “external validity,” which means that a finding about one prison at one point in time might not mean much for another prison at another point. They also don’t tell us much about how a cause produces an effect—such studies only detect effects and their magnitude. But in its study, Vera wants to know: how does prison education improve the wellbeing of prisoners, and would the intervention work the same way in other prisons or points in time?
It’s not clear that Vera’s study gives us much leverage to answer those questions. But Turner’s statement could be read as a defense of Vera’s pluralistic approach to policy research. While causal inference can be useful, it is insufficient on its own and arguably not always necessary in the policy context. By contrast, Vera produces research using a very wide variety of methods. This work teaches us about the who, where, when, what, why, and how of criminalization. Causal inference primarily tells us “whether.”
In a recent interview about Arnold Ventures’ funding priorities, Doleac explained that her goal is to “help build the evidence base on what works, and then push for policy change based on that evidence.” But insisting on “rigorous” evidence before implementing policy change risks slowing the steady progress of decarceration to a grinding halt. We need to dismantle mass criminalization as quickly as possible, with the biggest sledgehammers we can find–even if that makes “rigorous” research difficult to conduct.
In an email, Vera’s Turner echoed this point. “The cost of Doleac’s apparently rigid standard is that it not only devalues legitimate methods,” he wrote, “but it sets an unreasonably and unnecessarily high burden of proof to undo a system that itself has very little evidence supporting its current state.”
Indeed, mass incarceration was not built on “rigorous research.” Perhaps the single most influential theory of policing of the past 50 years— broken windows—was catalyzed by academics riffing about “the fear of being bothered by disorderly people” in a 1982 essay in The Atlantic. Yet today some philanthropists demand randomized controlled trials (or “natural experiments”) for every brick we want to remove from the wall of mass incarceration. Although applied microeconomists have succeeded in mainstreaming narrow causal inference standards, it’s not clear that policymakers share their preferences. Is it the case that policymakers prefer causal inference methods, or are economists like Doleac campaigning to make it so?
Prioritizing research may not be the best way to dismantle mass criminalization. (Both Doleac and Turner probably disagree with me.) When I wrote about then-New York Governor Andrew Cuomo’s bail reform rollbacks in 2020, I argued that fear-mongering around policy-making was completely at odds with the evidence. Indeed, the campaign for bail reform rollbacks began before reform went into effect—the tough on crime crowd typically starts swinging before researchers can even begin to evaluate a reform.
Decarceration is a fight that takes place on the streets and in city halls across America, not in the halls of philanthropic organizations. We need to help movement organizers build power and light fires underneath cautious politicians. It would be great if research contributed to that goal. But the narrow emphasis on the evaluation standards of academic economists will hamstring otherwise promising efforts to undo the harms of criminalization.
Jonathan Ben-Menachem is a PhD candidate in Sociology at Columbia University.

One thought on “malpractice or best practice? the fight over “rigor” in criminal justice reform”