I hope it will not disrupt the statistical discussions launched by Jeremy to launch a new line of discussion. My goal is to improve the culture of publication and coauthoring in my department. Although some of our students do great on this, others languish, and many of our students complain that they do not get enough mentoring about publishing. I have identified as one problem that many faculty consider it “exploitative” to involve students in their research if they are not being paid. Another problem is wide variation in opinions about the level of involvement that merits a coauthorship. What I want to do is to develop a set of normative guidelines for apprentice-like experiences that do not involve payment, as well as guidelines for those that do. I am working up a draft of this and would appreciate comments and reports on good and bad experiences and practices in other programs. So here is my draft. Comments, please. Continue reading “coauthoring norms 1: assisting and junior authoring”
No plan to make a habit of linking to Buzzfeed, but: Orange is the New Black lines inserted into Peanuts strips (HT:RCM). Involves lines from the current season, but no plot spoilers or even recognizable plot points. Mostly liked it because it made me nostalgic and appreciative of how brilliantly angsty the Peanuts aesthetic was for its time.
Nice post by Paul Allison about the conditions under which listwise deletion does as well or better than multiple imputation (HT: Richard Williams).
What the post doesn’t mention is the crazy rabbit hole that multiple imputation can send students down, often for results that are basically the same. It’s not that missing data can’t cause big problems for estimation, but that many of these problems aren’t solved by multiple imputation either. But it feels like an analyst is doing more about the problem and takes just seconds to demand as a reviewer.
I haven’t followed the Facebook study kerfuffle in any detail, nor have I looked at the study itself. But the ethics of the study have really bothered folks. I do think Facebook is incredibly creepy for the information and power they possess, so I can see why folks’ creepometers would be super-sensitive to Facebook experiments.* Still, I don’t get the freakout. Or, at least, there are existing research designs behavioral scientists use that I’ve already decided that I’m okay with, so it’s hard for me to understand the outrage about the Facebook experiment. Three examples: Continue reading “the ethics of that Facebook study”
“It’s like that story of the tortoise and the hare, and I’m the hare. Only I’m a hare who races ahead, takes a break, and then sort-of forgets about the race entirely. So I start a totally new race against a totally new tortoise. Then at some point, usually in the dead of night, I realize that I’m in like five races against five turtles with five different finish lines, and they’re all about to text to say they’ve reached the end and where the hell am I?”
First thing: is it in The Goldilocks Spot? For this, do not look at the result itself, but look at the confidence interval around the result. Ask yourself two questions:
1. Is the lower bound of the interval large enough that if that were the true effect, we wouldn’t think the result is substantively trivial?
2. Is the upper bound of the interval small enough that if that were the point estimate, we wouldn’t think the result was implausible?
(Of course, from this we move to questions about whether the study was well-designed so that the estimated effects are actually estimates of what we mean to be estimating, etc.. But this seems like a first hurdle for considering whether what is presented as a positive result should be interpreted as possibly such.)
Caveat: Note that this also assumes the hypothesis is a hypothesis that the effect in question is not trivial, and hypotheses may vary in this respect.
My last post raised some comments about one-tailed tests versus two-tailed tests, including a post by LJ Zigerell here. I’ve returned today to an out-of-control inbox after several days of snorkeling, sea kayaking, and spotting platypuses, so I haven’t given this much new thought.
Whatever philosophical grounding for one-tailed vs. two-tailed test is vitiated by the reality that in practice one-tailed tests are largely invoked so that one can talk about results in a way that one is precluded from doing if held to two-tailed p-values. Gabriel notes that this is p-hacking, and he’s right. But it’s p-hacking of the best sort, because it’s right out in the open and doesn’t change the magnitude of the coefficient.* So it’s vastly preferable to any hidden practice that biases the coefficient upward to get it below the two-tailed p < .05.
In general, I’ve largely swung to the view that practices that allow people talk about results that are near .05 as providing sort-of evidence for a hypothesis are better than the mischief caused by using .05 as a gatekeeper for whether or not results can get into journals. What keeps from committing to this position is that I’m not sure if it just changes the situation so that .10 is the gatekeeper. In any event: if we are sticking to a world of p-values and hypothesis testing, I suspect I would be much happier in which investigators were expected to articulate what would comprise a substantively trivial effect with respect to a hypothesis, and then use a directional test against that.
* I make this argument as a side point in a conditionally accepted paper, the main point of which will be saved for another day.