My last post raised some comments about one-tailed tests versus two-tailed tests, including a post by LJ Zigerell here. I’ve returned today to an out-of-control inbox after several days of snorkeling, sea kayaking, and spotting platypuses, so I haven’t given this much new thought.
Whatever philosophical grounding for one-tailed vs. two-tailed test is vitiated by the reality that in practice one-tailed tests are largely invoked so that one can talk about results in a way that one is precluded from doing if held to two-tailed p-values. Gabriel notes that this is p-hacking, and he’s right. But it’s p-hacking of the best sort, because it’s right out in the open and doesn’t change the magnitude of the coefficient.* So it’s vastly preferable to any hidden practice that biases the coefficient upward to get it below the two-tailed p < .05.
In general, I’ve largely swung to the view that practices that allow people talk about results that are near .05 as providing sort-of evidence for a hypothesis are better than the mischief caused by using .05 as a gatekeeper for whether or not results can get into journals. What keeps from committing to this position is that I’m not sure if it just changes the situation so that .10 is the gatekeeper. In any event: if we are sticking to a world of p-values and hypothesis testing, I suspect I would be much happier in which investigators were expected to articulate what would comprise a substantively trivial effect with respect to a hypothesis, and then use a directional test against that.
* I make this argument as a side point in a conditionally accepted paper, the main point of which will be saved for another day.