does a positive result actually support a hypothesis?

First thing: is it in The Goldilocks Spot? For this, do not look at the result itself, but look at the confidence interval around the result.  Ask yourself two questions:

1. Is the lower bound of the interval large enough that if that were the true effect, we wouldn’t think the result is substantively trivial?

2. Is the upper bound of the interval small enough that if that were the point estimate, we wouldn’t think the result was implausible?

(Of course, from this we move to questions about whether the study was well-designed so that the estimated effects are actually estimates of what we mean to be estimating, etc..  But this seems like a first hurdle for considering whether what is presented as a positive result should be interpreted as possibly such.)

Caveat: Note that this also assumes the hypothesis is a hypothesis that the effect in question is not trivial, and hypotheses may vary in this respect.

Author: jeremy

I am the Ethel and John Lindgren Professor of Sociology and a Faculty Fellow in the Institute for Policy Research at Northwestern University.

14 thoughts on “does a positive result actually support a hypothesis?”

    1. I think most published positive findings are exaggerated positives, and a substantial portion are simply false positives. So at the least we agree, or maybe I have a more radical view.


      1. I am on the same page with you. most positive findings are made because the one finding them expects certain outcomes from their testing. a lot of testing today, scientific and otherwise, are both biased and too objective. they are objective because the ones doing the tests tend to want to vindicate their own stance rather than find the truth.


  1. Confidence intervals are under-rated. And in cases where you’re doing permutation-based tests, it might make sense to make confidence intervals for your p-values (since you’re estimating them from a sample of all possible permutations) and then checking if your p-value is in the “Goldilocks Spot”.


    1. Confidence intervals around a p-value are a hard concept for me to get my head around, frankly. (Not saying there is anything illegit about them, just that I don’t understand them.)


  2. I don’t suppose it matters, but my cutesy use of Goldilocks spot referred to the sample size problem, not the plausibility or triviality of the estimated effect. Instead of creating rules about confidence intervals, I’d start by training people to understand the different constraints and issues in small sample research versus large sample research.


    1. A main determinant of the capacity of a study to produce an effect in the Goldilocks Spot is its sample size. We may be talking past each other a bit.


      1. The difference is emphasis: you are taking a model’s results as given and discussing how to interpret effect sizes, and I’m back at the start, asking “what kind of analysis can this data support?” and “how can I do my work so I get meaningful and well-supported results?” Too many people are trained to turn a multivariate regression crank (even with fancy variants of link functions and standard error adjustments) without beginning at the beginning and asking what the character of the data are and what kinds of analyses can it support. And remembering that there will always be SOME pattern in any data and so asking what the criteria will be for meaningfulness. My own standard has shifted to internal replication: does the result hold up under alternate specifications of the model? This is, however, not a good strategy if one’s goal is publication and career-building. But given that you have gone down the crank-turning path, I agree that your standard for interpreting a coefficient is meaningful. When I applied it to my own data, I was appalled to discover implausibly large effect sizes. Back to the drawing board. Again.


  3. One of the first checks I run is to see if the CI contains zero. If it does, I don’t think there’s really a good interpretation of the point estimate. It seems to happen pretty commonly, and given that, seems to suggest that arguing in terms of direction and significance isn’t scientifically meaningful.


    1. Well, if the CI contains zero, the result is not significant, unless your confidence intervals and significance tests are based on different conventions.


      1. Yeah I think this has been happening to me a lot because I’ll start with variables that R tells me are significant at some level, and then start looking at a range of skinnier and wider intervals on that variable. That’s an admittedly unsystematic way to set standards and evaluate estimates, and I guess it’s also easy to forget about the duality of the CI and significance testing by toggling between procedures like that.


    2. This exchange is an example of what I’m talking about. Whether variables are significant at some level or not depends both on the effective* sample size and what else is in the model. Specification error is leaving out predictor variables that should be in there as controls (and thus large-sample researchers typically throw in everything possible as controls), but if the sample is small, adding those extra predictor variables makes the models unstable and is likely to yield the kinds of crazy-large coefficients that Jeremy began this whole line of discussion with.

      *When the data are aggregates or other things besides people responding to surveys, the effective sample size is not necessarily the apparent sample size, because of nested data structures.


  4. I agree with question #1, and I think confidence intervals are a good way to address this issue. But I don’t agree in all cases with question #2. In a situation where there is low power (usually due to a small sample size) you can get a large point estimate with a lower-bound of the CI that implies a non-trivial effect, but the upper-bound of the CI could imply an effect that is implausibly large. In my view a valid interpretation of this result is that we have reasonable evidence a non-trivial effect exists, although the range of potential magnitudes of the effect is large.


    1. Fair point. You can have cases in exact logistic regression where a confidence bound doesn’t exist at all, but instead is infinity. That doesn’t mean the other bound is uninformative or that the whole estimation exercise is madness.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.