I hope it will not disrupt the statistical discussions launched by Jeremy to launch a new line of discussion. My goal is to improve the culture of publication and coauthoring in my department. Although some of our students do great on this, others languish, and many of our students complain that they do not get enough mentoring about publishing. I have identified as one problem that many faculty consider it “exploitative” to involve students in their research if they are not being paid. Another problem is wide variation in opinions about the level of involvement that merits a coauthorship. What I want to do is to develop a set of normative guidelines for apprentice-like experiences that do not involve payment, as well as guidelines for those that do. I am working up a draft of this and would appreciate comments and reports on good and bad experiences and practices in other programs. So here is my draft. Comments, please. Continue reading “coauthoring norms 1: assisting and junior authoring”
No plan to make a habit of linking to Buzzfeed, but: Orange is the New Black lines inserted into Peanuts strips (HT:RCM). Involves lines from the current season, but no plot spoilers or even recognizable plot points. Mostly liked it because it made me nostalgic and appreciative of how brilliantly angsty the Peanuts aesthetic was for its time.
Nice post by Paul Allison about the conditions under which listwise deletion does as well or better than multiple imputation (HT: Richard Williams).
What the post doesn’t mention is the crazy rabbit hole that multiple imputation can send students down, often for results that are basically the same. It’s not that missing data can’t cause big problems for estimation, but that many of these problems aren’t solved by multiple imputation either. But it feels like an analyst is doing more about the problem and takes just seconds to demand as a reviewer.
I haven’t followed the Facebook study kerfuffle in any detail, nor have I looked at the study itself. But the ethics of the study have really bothered folks. I do think Facebook is incredibly creepy for the information and power they possess, so I can see why folks’ creepometers would be super-sensitive to Facebook experiments.* Still, I don’t get the freakout. Or, at least, there are existing research designs behavioral scientists use that I’ve already decided that I’m okay with, so it’s hard for me to understand the outrage about the Facebook experiment. Three examples: Continue reading “the ethics of that Facebook study”
“It’s like that story of the tortoise and the hare, and I’m the hare. Only I’m a hare who races ahead, takes a break, and then sort-of forgets about the race entirely. So I start a totally new race against a totally new tortoise. Then at some point, usually in the dead of night, I realize that I’m in like five races against five turtles with five different finish lines, and they’re all about to text to say they’ve reached the end and where the hell am I?”
First thing: is it in The Goldilocks Spot? For this, do not look at the result itself, but look at the confidence interval around the result. Ask yourself two questions:
1. Is the lower bound of the interval large enough that if that were the true effect, we wouldn’t think the result is substantively trivial?
2. Is the upper bound of the interval small enough that if that were the point estimate, we wouldn’t think the result was implausible?
(Of course, from this we move to questions about whether the study was well-designed so that the estimated effects are actually estimates of what we mean to be estimating, etc.. But this seems like a first hurdle for considering whether what is presented as a positive result should be interpreted as possibly such.)
Caveat: Note that this also assumes the hypothesis is a hypothesis that the effect in question is not trivial, and hypotheses may vary in this respect.
My last post raised some comments about one-tailed tests versus two-tailed tests, including a post by LJ Zigerell here. I’ve returned today to an out-of-control inbox after several days of snorkeling, sea kayaking, and spotting platypuses, so I haven’t given this much new thought.
Whatever philosophical grounding for one-tailed vs. two-tailed test is vitiated by the reality that in practice one-tailed tests are largely invoked so that one can talk about results in a way that one is precluded from doing if held to two-tailed p-values. Gabriel notes that this is p-hacking, and he’s right. But it’s p-hacking of the best sort, because it’s right out in the open and doesn’t change the magnitude of the coefficient.* So it’s vastly preferable to any hidden practice that biases the coefficient upward to get it below the two-tailed p < .05.
In general, I’ve largely swung to the view that practices that allow people talk about results that are near .05 as providing sort-of evidence for a hypothesis are better than the mischief caused by using .05 as a gatekeeper for whether or not results can get into journals. What keeps from committing to this position is that I’m not sure if it just changes the situation so that .10 is the gatekeeper. In any event: if we are sticking to a world of p-values and hypothesis testing, I suspect I would be much happier in which investigators were expected to articulate what would comprise a substantively trivial effect with respect to a hypothesis, and then use a directional test against that.
* I make this argument as a side point in a conditionally accepted paper, the main point of which will be saved for another day.
I recently read a paper in which the author hadn’t authored directional hypotheses. They were just of the form that X was expected to be associated with Y. My reaction was just that a non-directional hypothesis was not much of a hypothesis, and I made a comment along the lines of, “you should probably clarify your ideas until you have more of an idea of how X and Y might be associated before you try to test the hypothesis with data.”
This leads me to try to be thinking if I have a more general position about the specificity required for something to be a substantively meaningful social science hypothesis. Does anyone have an example of something in social science where the hypothesis is non-directional (or is just a hypothesis that something “matters”), and that this hypothesis is not trivial? If so, please let me know.
(Not directly about Stapel, but a demonstration of how much antipathy there is among some well-established psychologists for the recent push for direct replication that has been going on there.)
A comment on one of my recent posts brought up the Stapel case in psychology. That was one of the important inciting incidents for the replication drama in social psychology that I’ve been blogging about.
The Stapel case was epic because so many articles were involved and he was definitely a first-stringer: he’d reached considerable stature in his country and was regularly featured in Dutch media. Also his case was simple, outright fraud, not about “questionable research practices.” Stapel’s case was even responsible for one of the rare cases of an article being retracted by an ASA journal.*
Stapel has a memoir, but it’s in Dutch. Here’s a (translated) money quote from a review, though: Continue reading “stapel and the cookie jar”
I seem to continue to be Making An Effort blog-wise. I have changed the long-past-expiry-date blog banner. I’m not entirely happy with the new one, but I’m far less happy about the time it took to make even that given other things to do.
The blog would be better if we had recent posts in the sidebar as well as recent comments. Due to legacy features of WordPress.com and our template, I don’t think we can do it without completely re-doing the template. See above re: time for the immediate prospects of this happening.
Kelly pointed to this in the comments on my last post: a CYOA game in which you get to take on different roles and try to prevent a science fraud scandal from happening.
As a connoisseur of CYOA, I don’t think it works that well as a game — too much stuff before and between choices — but the videos have surprisingly high production values and script quality for something like this.
NEW PERSPECTIVES EDITOR SOUGHT
The Theory Section is looking for a new editor or editorial team for its newsletter, Perspectives. We are now soliciting self-nominations from faculty members or students currently enrolled in a sociology PhD program. Teams of 2 or 3 people are very welcome to apply.
Please submit a CV and a 2-page letter of interest (including a short description of how you envision the newsletter) to the members of the Advisory Board:
- Wendy Espeland at email@example.com
- Andrew Perrin at firstname.lastname@example.org
- Robin Wagner-Pacifici at email@example.com
- Gabriel Abend at firstname.lastname@example.org <mailto:email@example.com>
We look forward to hearing from you!
I’m not sure if the author of this post is a graduate student or undergraduate, but I found it an intriguing statement about the problem younger people interested in methodology can find themselves in while working with established people who are very steeped in conventional practices and productivity. Quote:
One thing that never really comes up when people talk about “Questionable Research Practices,” is what to do when you’re a junior in the field and someone your senior suggests that you partake. […] I don’t want to engage in what I know is a bad research practice, but I don’t have a choice. I can’t afford to burn bridges when those same bridges are the only things that get me over the water and into a job. (HT: @lakens)
Mostly this is just a statement about power.* But it’s also maybe a statement about what can happen when developments allow the possibility of radical doubt to settle upon a field. Normally a junior person can have methodological doubts, but still think, “Well, these people must know what they are doing, because it’s been successful for them and so ultimately in practice it works, right?” But what happens when you have developments that lead to a lot of people starting to whisper and murmur and talk about how maybe it doesn’t work?
* I mean power in the ordinary sociological sense, not my ongoing obsession with statistical power.
This is intended as a friendly didactic post, not an addition to my various criticisms of the hurricane name study. But I do use that data and model. Frankly, I suspect I’ll be thinking about the lessons from that study for awhile and using it as a teaching example for years.
I’ve said that substantively it makes more sense to log the measure of hurricane damage, and that the model fits better when you do, even though the key result of their paper is no longer statistically significant. I worry the point may seem arcane or persnickety. So below the jump are a couple of graphs that show the substantive difference that this actually makes over the range of damage observed in their data. (Note the scales of the y-axis.)
Blog causation: the night I posted about how Australian restaurant service is worse, we had perhaps the best service that we’ve had at a not-that-expensive restaurant. I was so moved that I left nearly a 9% tip. But, to continue my previous post, here are other differences a no-tipping system makes:
1. The best part: when you are done with your meal, you can head to the register, pay, and leave. Poof! You don’t have to have that whole round where you wait for a bill, and then you wait for the server to come back and collect your money, and then you write in a tip after that. You get up, you pay, you’re gone!
2. The no tipping custom here is combined with including the tax in listed prices. This means that the total amount you pay at a restaurant is simply the sum of the menu prices of what you ordered. This is so cognitively different from the US that to this day it blows my mind.