In the context of all of the debates about replication going on across the blogs, it might be useful to introduce a distinction: experimental vs. statistical replication.* Experimental replication is the more obvious kind: can we run a new experiment using the same methods and produce a substantially similar result? Statistical replication, on the other hand, asks, can we take the exact same data, run the same or similar statistical models, and reproduce the reported results? In other words, experimental replication is about generalizability, while statistical replication is about data manipulation and model specification.
On the one hand, sociology, economics, and political science all have ongoing issues with statistical replication. The big Reinhart and Rogoff controversy was the result of an attempt to replicate a statistical finding that revealed some unreported shenanigans in how cases were weighted, and that some cases were simply dropped through error. Gary King’s work on improving replication in political science aims at making this kind of replication easier, and even turning it into a standard part of the graduate curriculum. Similarly, I believe the UMass paper (failing to) replicate Reinhart and Rogoff emerged out of a econometrics class assignment (e.g.) that required students to statistically replicate a published finding.
On the other hand, psychology seems to have a big problem with experimental replication. Here the concerns are less about model specification (as the models are often simple, bivariate relationships) or data coding, but rather about implausibly large effects and “the file drawer problem” where published results are biased towards significance (which in turn makes replications much more likely to produce null findings).
Both of these kinds of replication are clearly important, but they present somewhat different issues. For example, Mitchell’s concern that replication will be incompetently performed and thus produce null findings when real effects exist makes less sense in the context of statistical replication where the choices made by the replicator can be reported transparently, and the data are shared by all researchers. So, as an attempt at an intervention, I propose we try to make clear when we’re talking about experimental replication vs. statistical replication, or if we really mean both. Perhaps we might even call the second kind of replication something else like “statistical reproduction”** in order to highlight that the attempt to reproduce the findings are not based on new data.
What do you all think?
* H/T Sasha Killewald for a conversation about different kinds of replication that sparked this post.
** Think “artistic reproduction” – can I repaint the same painting? Can I re-run the same models and data and produce the same results?