li’l didactic aside


Key findings in quantitative social science are often interaction effects in which the estimated “effect” of a continuous variable on an outcome for one group is found to differ from the estimated effect for another group. An example I use when teaching is that the relationship between high school test scores and earnings is stronger for men than for women. Interaction effects are notorious for being much easier to publish than to replicate, partly because it is easy for researchers to forget (?) how they tested many dozens of possible interactions before finding one that is statistically significant and can be presented as though it was hypothesized by the researchers all along.

Various things ought to heighten suspicion that a statistically significant interaction effect has a strong likelihood of not being “real.” Results that imply a plot like the one above practically scream “THIS RESULT WILL NOT REPLICATE.” There are so many ways of dividing a sample into subgroups, and there are so many variables in a typical dataset that have low correlation with an outcome, that it is inevitable that there will be all kinds of little pockets for high correlation for some subgroup just by chance.

Examples of such findings in the published literature are left as an exercise for the reader.

Author: jeremy

I am the Ethel and John Lindgren Professor of Sociology and a Faculty Fellow in the Institute for Policy Research at Northwestern University.

10 thoughts on “li’l didactic aside”

  1. @olderwoman Very much agreed.

    I’m a little confused why with all the rampant scientism running about, Sociologists don’t hold their quantitative findings to higher standards – require researchers to report to reviewers, say, all the code they ran on the data. Ok, that’s probably absurd. But there should be some record kept of all the various substantive tests done, the order they were done in, and some rationale for it. I think one prof here has his graduate students follow biotech lab conventions and record notes on their work in bound, dated lab notebooks, for example. Otherwise, it’s hard to believe any published quant finding that is anything other than overwhelmingly convincing, no?


  2. “THIS RESULT WILL NOT REPLICATE.” Maybe so, but will the “failure to replicate” get published? (I use quotation marks to emphasize that we refer to what is probably good, solid research as a “failure.”)


  3. Damn! I hope that Jeremy doesn’t review the paper I just submitted (otherwise, I *really* need a reviewer voodoo doll). Although, I would like to think that the interactions that I used were theoretically motivated.

    I don’t keep a bound lab notebook of my research (I tend to lose things pretty easily, only to find them a year later when they are no longer of much practical use). But, I have started using a system where I archive all of my Stata .do files as well as the output created by those .do files into directories organized by date. Beyond being challenged by a reviewer, it has also been helpful to go back and recreate what I have done in the past so that a) I don’t replicate the same analysis again and b) I can explain what I did in the research process.


  4. As someone with really disorganized records, I’m increasing drawn to the view that we social scientists need to develop our equivalent of the “lab book” culture and get it drilled into people while they are young (i.e. in grad school). As I know some of you know, scientists require bound journals with numbered pages and no blank space; computerized journaling has to generate tamper-proof records. The trouble is, I still can’t figure out the best way to do a lab book equivalent, either for myself or to teach it to my students, although I have gotten better about organizing my stata do files.

    Of course not all interactions are ad hoc, but the trouble is that many are supported by very little data and there is usually too little documentation in an article to clue you in. I once reviewed an article that was making a big deal of a curvilinearity. From the descriptives and willingness to use a spreadsheet, I was able to complain that there was only one case in the portion of the curve after the bend. But you usually don’t have even that much information about distributions to interrogate an interaction as a reviewer. And, if you actually review articles a lot (I get asked to do several a month), you really cannot be expected to re-analyze everybody’s data for them. To Mike I say, theory isn’t enough: you also need enough actual data points to support the claimed interaction with all the controls. And good luck! I hope it works out.


  5. I think it is especially dastardly to interact a categorical variable with a continuous variable with a 0 value outside the range of of data and then try to interpret the main effects of the categorical variable. Oh look! I interacted gender with age and found that suddenly gender is highly significant.


  6. So, back in the day, I had a prof who wanted us to include THREE-way interactions in a model. We refused on the grounds that we had no idea how to interpret them. Since the prof couldn’t either (who could?), they didn’t go in. We were constantly “managing” the project in this way.

    Other interactions were included. I like to think they were hypothetically justified, though they probably did fail the replication test.


  7. It is important to distinguish statistical significance and substantive significance. Based on the alpha unadjusted for the multiple comparisons being made, interaction effects are notoriously “over-published.” However, if I am reasonably comfortable with my model specification and the point estimate of the effect is substantively significant (say, compared to some known effects), I would rather err on the over-optimistic side and report it. Not just that, I would discuss a substantively significant but statistically insignificant finding along with other findings even if the reviewers and the editor would not let me publish a paper on something that doesn’t pass the statistical test on the basis of some arbitrary 1/20 error rate.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.