With all the poll watching that goes on in elections these days, the question of how accurate the polls are has become more interesting (to me at least). I’ve been informally tracking the question of whether certain polling outfits tend toward liberal or conservative bias for quite some time. To be clear I’m not accusing any particular polling operation of purposeful bias, but rather just calling into question whether their methods (particularly of sampling, weighting, collecting data, and especially constructing likely-voter models) trend one way or another. In the case of the battleground areas, at least, that the pollsters–as a group–missed the mark is not a huge surprise, but the consistency in the pattern of that miss is more of a problem and suggests that they need a collective adjustment in how they are going about this.
To look at this, I took the 12 states still considered “toss-ups” by Real Clear Politics at election time (OH, FL, VA, NH, NC, MI, WI, PA, IA, CO, NV, MN) and calculated the difference between the “RCP Average” (of recent polls) and the percentage of actual vote in those states as of this morning. The result was that the RCP Average was conservatively biased (over-estimating support for Romney) in 11 out of the 12 states (Ohio being the only exception). The average miss was 2.4 points, which is probably within the “margin of error” of most of these polls, but the consistency of the direction of the error suggests that some re-tooling is necessary.
3 thoughts on “poll bias”
A recent article showed that the people most likely to over-report voting on surveys (i.e., lie and say they voted when they didn’t) are the most educated, better-off, etc; i.e., exactly A) the folks who do vote more, on average (but not as much as we might think) and B) the most likely to vote Republican. If pollsters are using most research on voting and/or self-reports of likelihood of voting, that could explain their Republican bias in likely voter models. I think the registered voter presidential preferences of many outfits were actually closer to the results; or at least splitting the differences = closer to actual numbers.
cite: Ansolabehere, Stephen, and Eitan Hersh. 2012. “Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate.” Political Analysis. http://pan.oxfordjournals.org/content/early/2012/08/27/pan.mps023
“Within the margin of error” is a phrase that should die a painful death. That the SEM ranges from n-x to n+x doesn’t mean it’s equally likely to be at those values as at n so much as “well, we didn’t ask the whole population, so it depends on how skewed our sample is from the whole.”
In electoral polling, you know the census data of the whole population, the demographics of your sample, and you get a strong indication of their issue awareness as a proxy for participation (“likely voters”). All of that should mitigate against the SEM-in-a-vacuum calculation that tends to be presented as “MoE.”
Short version of the above: your sample may have started as random, but you know enough about it to know how it maps against the general population. To present the standard MoE bands is ignoring information.
Broadcasters may ignore that due to their skewed incentives, but that doesn’t mean the rest of us should.
Is there an alternative you’d suggest for popular reporting? Certainly the MoE discards important information, but discarding some information seems like a practical necessity for reporting sampling error to a nontechnical audience. If you were desiging a system for reporting such error to such an audience, what information would you discard, what would you retain, and how would you report?