verifying results

Lisa Wade recently made me aware that a group of psychologists have decided to try and verify the results of every article written in three journals in 2008. I think this is a great idea. And the likelihood that the results can be replicated look pretty unlikely. As the Chronicle piece I’ve linked to points out,

Recently, a scientist named C. Glenn Begley attempted to replicate 53 cancer studies he deemed landmark publications. He could only replicate six. Six! Last December I interviewed Christopher Chabris about his paper titled “Most Reported Genetic Associations with General Intelligence Are Probably False Positives.” Most!”

If only 6 out of 53 cancer studies can be replicated, I shudder what to think what the social science data will say (in part because there is so much more noise in a lot of our research design). It’s more of an open question to me what this means. It could mean that researchers are generally frauds. This could take different forms. On one extreme they could simply fabricate results. This happens, perhaps not with amazing regularity. But it’s hard to say because we rarely do replication studies. On the other hand, it could be something less nefarious. Scholars hunt and hunt for a significant result and once they find one that is interesting enough they publish. But such results might be produced more than actually observed. Or for experimental work it could simply mean that the published result (or the replication) are outliers, or that social conditions have changed influencing the likelihood of observing a result (I think of performativity here, particularly for famous results).

Now it may seem fresh to hear an ethnographer bring up the problems of replicability in social science work (although as an aside I’ll point out that I do work no one knows about — experimental work! And I have new such work under review). For an ethnographer dealing with this kind of stuff is tricky. At first I thought I would put my fieldnotes online. But that provided impossible, because there were some terribly personal things in there that could hurt teenagers; further, my notes weren’t anonymized. And so I decided not to make the place of my study anonymous. Here I fall a bit into the Mitch Duneier school, which suggests that anonymity is more a guise to protect the researcher than the subjects. And so you can dig up my subjects. You can go to the school. You can find the newspaper reports of incidents I talk about (the hazing scandal, for one).

In fact, ethnographers have a bit of a history of verifying results. The Duneier/Klinenberg debate would be one example. And in many ways the Chicago School is very much about revisiting and re-examining results, often in the same neighborhoods. I am not arguing here that ethnography is better than other methods at replicating results. I’m simply pointing out that the not uncommon refrain about the lack of replicability of my main kind of work is hardly accurate.

As for other work, I wonder. What if a group of us decided to try and replicate every finding in two major journals published in 2008? We used available data to see if we could produce the same statistical results? We went back to communities to see if what was reported actually happened (a la Duneier). We revisited interviews or historical records. Would we, like the cancer researcher only be able to replicate 6 out of 53 studies. And if that were the case, what would it mean? I would opine that this would be worse for folks using available data, because they would not have the some protections as we experimentalists or ethnographers (the world has changed, one of these results is an outlier, and it could be your replication). But still, it would be an interesting task.

6 thoughts on “verifying results”

  1. A few years back, in a methods class, we were tasked with replicating a study from ASR or AJS. It was extremely difficult. I think we need to be better about being more specific about exactly what was done to reach said result. Folks are so much more interested, however, in the answer rather than the process. But the process, to me (and I’m sure to others), is crucial to understanding the answer.


  2. I have long thought that we need more replication. My belief is that the way to do this is to assign replication to grad students as an exercise. Replicating a study is an excellent way to learn skills. And then to have repository of reports on successful or unsuccessful replication. People could report that they plan to replicate study X (so there would not be too much duplication of effort) and then report the results. Not a full paper, no peer review, just a methods section and results.


  3. I love it, too. Replicating an experiment is very different from checking someone’s analysis of secondary data. Both are important, but they would uncover different kinds of problems, aside from fraud. In experiments you might be selectively publishing an outlier. In secondary data analysis you might be making a mistake — or trying 100 interaction combinations till something comes up significant. That’s harder to undo in a replication — designing sensitivity tests, I guess.


  4. > “C. Glenn Begley attempted to replicate 53 cancer studies he deemed landmark publications. He could only replicate six. Six! ”

    I wonder what this means. Deciding to replicate 53 landmark cancer studies (with patients, protocols, and control groups? With lab work? What?) sounds like it’d be next to impossible for one research group. Maybe in this case it was a data reanalysis? I suppose I should just LMGTFY.


  5. For his undergrad thesis, my son tried to replicate a leading study of the effects of teachers’ unions by economist Caroline Hoxby. Hoxby found that unions increase costs and lower performance. But my son, using the mostly the same data sources, ran the regressions and found that with even slight changes in the model or measures of a variable, those relationships disappeared. Or as he put it, “if the results are so sensitive to slight changes in the model specification, then the robustness of Hoxby’s results is questionable.” (FWIW, the Economics department gave him high honors. I gave him my AFT pin.)


  6. Non-experimental economists tend to use the word “replication” to refer to whether independent (re-)analysis of the same data yields the same results, whereas psychologists and other/real experimental sciences tend to use “replication” to refer to whether analyses of different data yields the same results.

    I’m a (far down the list) co-author on the Chabris paper mentioned in this paper. The author of the Chronicle article seems to marvel at the use of “Most” in the title meaning “majority,” which is a little amusing because a more accurate title for the paper may have replaced “Most” with “Perhaps All”.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: