Last week, Andrew Gelman wrote about my earlier post on interaction effects. My thoughts:

1. Are interaction effects important? Yes! If anything, I believe interaction effects are more important than most quantitative social scientists do, because I believe “main effects” are a very useful fiction but fictive nonetheless. **Anything that affects anyone affects different people differently.** And there are reasons why different causes affect some people more than others, which means there are ubiquitious interactions Out There to be found.

2. In behavioral genetics, there is a distinction between “biological” and “statistical” interaction. Roughly, biological interaction refers to an actual substantive process out in the world, and statistical interaction is what you can observe with regression models on population data. You can statistical interaction in the absence of biological interaction and vice versa. I wish a distinction between “substantive” and “statistical” interaction would diffuse more broadly, and substantive interaction is what social scientists interested in causal inference ultimately need to be focused on.

3. The problem is not that interaction effects are not important, but that they are very tricky. Part of the trickiness is how easy it is to be deluded into thinking interaction effects are replicably real when they are not. There are deeper issues, though. One:

Imagine that 30% of white men and 50% of white women have read a novel in the last year (I’ve no idea what the actual percentages are). Now imagine that 15% of black men have read a novel in the last year. What % of black women would correspond to their being no interaction between race and gender? 25%? 35%? In my neck of the woods, the most commonly used model implies a null hypothesis of 29.5%. There’s a lot of unreflected faith in the substantive fidelity of logistic regression to take deviations from 29.5% as worthy of little asterisks and publication.

4. What I was pointing to in my post is the question of how one goes about deciding what published interaction effects to believe, especially in areas where data limitations make replication sparse (and, for that matter, often not actually replication, in that there is not some transcendant true parameter that one can pursue with samples on different populations). Three things:

a. *Does the theoretical explanation seem like something came up with after the fact?* (It’s good to practice post hoc explanation–no, not in print–so that one has a better ability to detect it in other work.)

b. *Does the data include several related measures that it seems like the author could have used but did not?* Worst here is when an an interaction is based on a survey item that is part of a set of items in a big study that were intended to measure a similar construct, especially if the paper makes no mention of the existence of the other items. (This is one reason ‘distance breeds enchantment’ with datasets; that is, the more one learns first-hand about secondary datasets, often the less one comes to believe findings from it.)

c. The phenomenon referred to in my earlier post. The interaction is a group difference in the effect of a continuous variable on an outcome, where the group with the larger effect is also the group much smaller in size. In evaluating such interactions, at the very least figure out how small the smaller group is, and think of that number as basically the N on which base one’s judgment about how to regard the finding.

(2) is gold. Is it like the distinction between statistical conclusion validity and ecological validity?

LikeLike

Also for 4 (c), in graphing interactions, could we use lineweights to signify group size?

LikeLike

Jeremy’s original post specifically focused on a result where one small sub-group has a markedly different pattern from the rest of the data. Apart from Jeremy’s list of things to watch out for, I’d like to add tables that show only the coefficients of interest and suppress the coefficients for the other control variables. When coefficients for the controls are given, I have often seen the effect of an important control variable change from large and significantly positive to large and significantly negative when an interaction effect is added. If something ELSE in the model besides the “interaction effect” is different, it is not clear what the interaction effect means.

LikeLike

Whoa! That’s weird. You mean a control variable that is not either of the variables in the interaction, right? My guess would be a small sample and a problem with collinearity.

LikeLike

@2 – Interesting idea. I haven’t seen that done.

LikeLike

jeremy 4: exactly! That was also my point in my comment on your first post. I’ve seen zillions of papers with way too many independent variables in them relative to sample sizes. Most real variables have few data points in the extremes of their ranges; this is a normal property of distributions, including normal distributions, not to mention skewed ones. If you get too many variables in a model, coefficients start essentially representing two data points that have similar values at the extreme of one independent variable and have different values on the dependent variable. Relatedly, sets of variables that check out fine for correlations at the bivariate level can have extremely high multicollinearity in combinations of 3 or 4 or 5 at a time. This is virtually inevitable with macro data where there just are not that many cases and independent variables tend to be correlated, but also happens easily with relatively small (n<200) surveys, especially where one of the subgroups of interest is small (your original point). You can have this problem with linear models, but interaction effects make it almost certain if you are not careful.

My training is old and I don’t know what the current best practices are for watching out for this. I’ve taken to writing loops that predict each independent variable from the set of all other independent variables. Sometimes if I throw everything in, I’ll get an R2 of .9. That tells me I cannot use all those independent variables at once. I also watch out for pairs of highly correlated independent variables that take large significant opposite signs: another symptom of collinearity problems. I also run sensitivity tests: are the results robust when other variables are added or subtracted from models?

LikeLike