why log?

This is intended as a friendly didactic post, not an addition to my various criticisms of the hurricane name study. But I do use that data and model. Frankly, I suspect I’ll be thinking about the lessons from that study for awhile and using it as a teaching example for years.

I’ve said that substantively it makes more sense to log the measure of hurricane damage, and that the model fits better when you do, even though the key result of their paper is no longer statistically significant. I worry the point may seem arcane or persnickety. So below the jump are a couple of graphs that show the substantive difference that this actually makes over the range of damage observed in their data. (Note the scales of the y-axis.)

Predictions for hurricanes named “Andrew” and “Bonnie” implied by their published model, with damage unlogged:

unlogged

Predictions when you log damage:

logged

See, when you log damage, you aren’t suddenly making predictions within the observed range of data about hurricanes that would have killed 12 times more people than Katrina.

A different issue is that even though the model no longer makes absurd predictions, the predicted difference in deaths between “female hurricanes” and “male hurricanes” is still implausibly large for any behavioral model of why people die in hurricanes. And even then the coefficient for this interaction effect between damage and hurricane name is not significant. In other words, even not significant effects imply implausibly large predictions. This is what it means to say the study is severely underpowered — even if the effect was massive, more massive than it seems conceivably rational to expect, it’s still too small to be statistically discernible.

Author: jeremy

I am the Ethel and John Lindgren Professor of Sociology and a Faculty Fellow in the Institute for Policy Research at Northwestern University.

9 thoughts on “why log?”

  1. Is that really a reason to log it: because the predictions from the model you happen to have are more reasonable? I think that’s a nice outcome, but I think you need a theoretical reason to log it independent of that. But I could be wrong.

    Like

  2. Good point. The theory–well, the substantive rationale–is straightforward. It’s just that it makes more sense to suppose that a multiplicative change in the number of deaths is a function of a multiplicative change in the damage. So whatever % change in damage doubles the number of deaths, you’d need to have that % change again to double it again.

    This is way more reasonable than imagining that additive changes in damage result in multiplicative changes in deaths. Why would each $x million in damage result in the same % increase in deaths?

    Like

  3. I’ll go ahead and take the hit and raise my hand in case other kids in the class are baffled too. Can you explain this a different way or put down some toy numbers to make this concrete?

    Like

  4. Grahamalam: I’m pretty certain what Jeremy is saying goes like this…

    When you log the IV and the DV, a given percentage increase in the IV produces a given percentage increase in the DV. So, for example, a 1% increase in damage is always associated with a 2% increase in deaths.

    In contrast, when you use the raw scores for the IV and a logged DV, you’re implying that a unit change in the IV, in raw score units, is associated with a percentage change in the DV. So, for example, a $1 million increase in damage is always associated with a 10% increase in deaths.

    The difficulty arrises because a percentage change is, as Jeremy says, a multiplier. A 20% increase is equivalent to multiplying something, a baseline number of deaths that we’ll call x, by 1.20.

    The result in the second case is that each additional unit of damage causes a larger and larger multiplier to be applied to x. So, in a sense, each additional $1 million in damages has a greater effect than the one that preceded it, which seems a bit odd. Does that make more sense?

    And Jeremy, did I just horribly misrepresent your argument?

    Like

    1. Kind of. My confusion is definitely over the translation of additive versus multiplicative effects. I get the interpretation of coefficients in a log-log vs. level-log models, but not the rest of this: “The result in the second case is that each additional unit of damage causes a larger and larger multiplier to be applied to x. So, in a sense, each additional $1 million in damages has a greater effect than the one that preceded it, which seems a bit odd.”

      Though it would certainly seem odd that you get an increasing return of deaths relative to the scale of damage. That would imply that relatively small storms were annihilating entire populations, and we’d have less 5 o’ clock news interviews with the colorful guy who’s always left standing in the trailer park afterward.

      Like

    2. Yes, Matt’s way of saying it is a clearer way of putting the matter than what I said.

      Another way to see concretely what’s going on is to look at the blue lines in the plots in the post. If you don’t log it, you get the top plot, with the increasing “returns” to absolute increases in damage. Why would one ever expect that? The latter seems to me obviously substantively more plausible, and, lo, it fits the data better.

      Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s