The authors of the hurricane name study have amended their statement to include responses to some of my criticisms (starting around p. 4). They don’t respond to this one, which is extremely fundamental.
I’m not going to bother going through the whole thing, because this has already been too much time for a study so obviously and irresponsibly flawed. But, in case there is any ambiguity about the competence level I’m up against here, let’s just consider their argument about logging variables:
Freese also argued that we should have logged normalized damage, which we standardized in our final model. However, the range of a standardized variable is from SX to +X with 0 as a mean. When you log a negative number, it becomes missing. In the case of normalized damage, this eliminates 67 observations, or most of the data see Appendix). If this was his suggestion, that is not a viable approach.
Hurricane damage is measured in dollars. When hurricanes inflict damage, hurricane damage is always a positive amount of dollars. You can log that amount. If you want to standardize it afterward, fine.
Instead, these people assert that the way to do it is to standardize the variable first, which makes many values negative, and then log the variable, dropping all those negative values. This is just incompetent.
And they don’t stop there! They go on:
Recall as well that the negative binomial model itself has a log link function that internally logs the linear predictor prior to estimation. If we did not standardize the continuous predictors, and they were in fact not linear in effect, we would have of course transformed them appropriately.
Yes, the negative binomial model has a log-link function. But the implication is not like you are logging the independent variables. It’s like you are logging the dependent variable. (This is why if you have an outcome without 0’s, count models often give similar results to just logging the outcome and using OLS.) So once again they have it backwards.
Let’s think about this substantively. Their model is that the rate at which people die is connected to the absolute dollars worth of damage. So that if the dollar damage of a hurricane increases by a specific amount, it doubles the number of deaths, and then if you increase the damage again by the same amount, it doubles the number of deaths again. This is why they get such crazy predictions from the so-called “sophisticated count model” that can be replicated in half a tweet.
A more sensible view of the world is that if you increase the damage of a hurricane by some %, you also increase the number of deaths by some %. So that if you double the damage of a hurricane, there will be a % increase in the number of deaths. And then to get that same % increase in deaths again, you have to double the damage again.
So, logging damage makes a lot more sense, and I was able to confirm in a few seconds that logging the damage variable fits better (pseudoR2 = .12 vs. .08). Of course, their key result is no longer significant when you do this (p=.20).
In case anything above sounds mysterious, here’s the Stata code to confirm what I’ve done yourself:
. gen lnNDAM = ln(NDAM) // how to log a variable
. egen ZlnNDAM = std(lnNDAM) // standardize it afterward
. nbreg alldeaths c.ZMasFem##(c.ZlnNDAM c.ZMinPress)