I ended a recent post talking about how we shouldn’t blame the media for overstating research findings when the overstatements start in the press releases that universities and journals distribute. This led me to start looking around at more press releases for studies I remembered as getting a lot of attention.
Try it yourself: you might be surprised by all the surprise. A common narrative is that something inspires a hypothesis, researchers conduct a study to test that hypothesis, and then, more than merely finding a result that supports their hypothesis, the researchers were shocked by how big the effect turned out to be.
How should one think about all this purported surprise?
Let me offer a diagnostic: given the design of the study, how big of an effect would have been necessary in order for the result to be publishable? If the only way somebody could have gotten a publishable result is by finding an effect about as big as what they found, doesn’t it seem fishy to claim to be shocked by it? After all, doing a study takes a lot of work. Why would anybody do it if the only reason they ended up with publishable results is that luckily the effect turned out to be much more massive than what they’d anticipated?
Wouldn’t you hope that if somebody goes to all the trouble of testing a hypothesis, they’d do a study that could be published as a positive finding so long as their results were merely consistent with what they were expecting?
Let’s consider this for fields, like lots of sociology, in which a quasi-necessary condition of publishing a quantitative finding as positive support for a hypothesis is being able to say that it’s statistically significant–most often at the .05 level. Now say a study is published for which the p-value is only a little less than .05. Here it is obviously dodgy for researchers to claim surprise. They went ahead and did their study, but had the estimated effect been much smaller than their “surprise” result, they wouldn’t have been able to publish it.
Now think of the case where observing an effect 50% bigger than expected would be a shock. Different research questions are going to differ as to what counts as surprising, but this feels like a lower-bound case for a lot of research areas I’m familiar with. In that case, we should be suspicious unless they are talking about a result with a p-value less than .003. Otherwise, they would not have achieved a significant result if they had merely observed an effect size that was in line with their expectations at the start of the study.
What about being shocked by an effect that was twice as large as anticipated? That would imply the p-value should be less than .00009, because again otherwise their expected result would not have been significant. What if shock implies an effect three times bigger than anticipated? The corresponding p-value should be below .000000004.
[Technical aside: One could say that I’m understating the matter, because the above results imply a researcher goes ahead and does a study with 50% power. That is, I’m presuming researchers will forge ahead with a study so long as, if there’s an effect size around the size they were thinking, there’s a 50/50 chance it will show up as significant. For all that time, wouldn’t you think they’d want better odds? However, given all the not-necessarily-unreasonable ways of presenting p-values above .05 as positive findings — one-tailed tests, “marginally significant”, “approaching significance”, “substantively significant” — having 50% power of observing p < .05 means having more than 50% power of being near .05.]
Of course, a different problem is that researchers may begin an empirical project without any actual notion of what effect size they are imagining to be implied by their hypothesis. Or, even if they did have some idea about the size of the effect, they still have no idea what chance they have of actually finding positive evidence for that hypothesis given the study they are embarking on.
I’ve taken to calling such projects Columbian Inquiry. Like brave sailors, researchers simply just point their ships at the horizon with a vague hypothesis that there’s eventually land, and perhaps they’ll have the rations and luck to get there, or perhaps not. Of course, after a long time at sea with no land in sight, sailors start to get desperate, but there’s nothing they can do. Researchers, on the other hand, have a lot of more longitude–I mean, latitude–to terraform new land–I mean, publishable results–out of data that might to a less motivated researcher seem far more ambiguous in terms of how it speaks to the animating hypothesis. But I’ve already said enough for one post.