I am going to take Dan’s invitation to consider one aspect of the polls that I don’t see getting a lot of attention right now, but that I think could be important: undecided voters could explain much of the polling error being discussed.
In other words, I don’t think that the polls were that wrong. I know that this view puts me in the minority, even among people who think about these things for a living. What we have, I think, is a failure to really consider how we should interpret polls given two very unpopular candidates and a possible “Shy Tory” effect where Trump supporters reported being undecided to pollsters.
Let’s break down the vote share by breaking it into its component parts:
Estimates of Hillary Clinton’s vote. The polls were nearly spot-on estimating Clinton’s final popular vote share, both nationally and in battleground states. If anything, polls taken in the final week in battleground states underestimated her final vote share by about a point, well within normal sampling variation. Of the 11 battleground states I looked at, polls overestimated her support in Ohio by 0.7% and in Wisconsin 0.5%.
Estimates of Donald Trump’s vote. The polls underestimated Trump’s final popular vote share by quite a substantial margin. In the 11 battleground states I looked at, polls underestimated Trump’s support by 4.4%. That’s a huge miss. Except,
Estimates of undecided voters were very high. This is a point that Nate Silver made about his final estimates yesterday morning. In the 11 battleground states, the average number of undecided voters in the final week of polls estimated 7.3% of voters were undecided, with a wide range from 2.6% in Florida to 15.7% in Minnesota (equal numbers of polls were not conducted in each state; therefore these means obscure different levels of sample error).
Undecided voters were sufficient to cover Trump’s polling deficit in most states. In eight of the 11 battleground states, the difference between Trump’s poll numbers and final numbers could be made up by moving 100% or less of the undecided voters into Trump’s column. The column
trump_und in the table below shows the percentage of undecided voters that would have voted for Trump in order to make up the difference between his poll numbers and the vote counts (as I pulled them off of Google’s poll returns page over the course of the day).
clinton_vote are the proportion of the state’s votes acquired by Trump and Clinton;
clinton_poll are the (unweighted) average of percent support for Trump, Clinton, and “Undecided” for all polls in the state ending after November 1 (acquired from the Pollster api). The “Undecided” column includes all responses that are not support for a specific candidate in the poll, including Johnson and Stein.
clinton_und represent the percentage of undecided voters who voted for Trump and Clinton: numbers greater than 100% mean that there were not enough undecided voters to cover the difference between returns and polls and negative numbers means lower returns than poll numbers.
In the states that Clinton ultimately won (or is projected to win) – Nevada, Minnesota, Virginia, Colorado, and New Hampshire – Trump won fewer than two of three voters who responded that they were undecided in the last week. But, in all five of those states, a sufficient number of undecided voters could make up the difference between Trump’s polls and Trump’s returns.
In three states that Trump won – Wisconsin, Michigan, and North Carolina – he did so with between 64% and 100% of the undecided voters’ votes allocated toward his final returns. My estimates of the share of late-deciding Trump voters are not terribly out of line with the exit polls showing the split between Trump & Clinton among those who decided in the past week. Add in a decent percentage of “shy” Trump voters and you can easily create the numbers here.
That leaves three states where undecided voters could not account for the difference between Trump’s polls and his returns: Pennsylvania, Florida and Ohio. If we give Trump all of the undecided voters calculated from the final polling average in Pennsylvania, that would only make up 98% of the difference between Trump’s percentage in the final polling average and his final vote share. It’s not inconceivable that “shy” Trump voters could make up two percent of the vote, though getting all undecideds seems unlikely. In Florida, the total percentage of undecided voters could only cover 87% of the difference between Trump’s polls and his returns. In Ohio, undecided voters only cover 68% of the gap.
In eight of the eleven battleground states, there were enough undecided voters to make up the difference between the final poll average and result. This doesn’t mean that the polls were “right.” The movement was surprising and polls didn’t give any indication of the degree to which undecideds would move heavily in one direction or another. We should examine the polls to learn lessons to help improve science and journalism. But before we condemn polls as useless, let’s at least be sure to use all of the information that they do provide.
Photo credit: Oliver Tacke, Flickr