less trust, moore verification: attention checks reveal errors in alabama poll data

Guest post by Nathan Seltzer

In the days following the publication of a Washington Post article that detailed allegations of sexual abuse against Roy Moore, Emerson College Polling released an election poll of Alabama voters that showed Moore maintaining a 10-point lead over his opponent Doug Jones, 55%/45%. This poll received sustained national press and influenced perceptions of the Alabama senate race since it was one of the first polls to be released after the Roy Moore allegations. The Emerson College Poll was conducted using survey data collected over the internet and by landline phone.

 In my working paper, “Less Trust, Moore Verification: Determining the Accuracy of Third-Party Data through an Innovative Use of Attention Checks,” I analyze raw data from this poll and find irregularities in the internet sample that might suggest that the respondents were not properly sampled by the data vendor that administered the survey, Opinion Access Corp., LLC.

As researchers increasingly rely on internet data vendors to acquire respondents for polls and surveys, I argue for the necessity of proactively verifying the accuracy of third-party data. In the paper, I detail how researchers can use survey “attention checks” to determine whether data vendors have provided samples that match their requested sampling frame. In the example below, I repurpose two pre-existing questionnaire items from the November 13 Emerson Poll to verify the accuracy of the sample provided by Opinion Access Corp.

Verifying Samples through A Priori Expectations of Variable Distributions

To verify whether the internet sample was comprised of valid Alabama respondents, I examined the joint frequency distribution of two overlapping geographic variables in the dataset: county of residence and US congressional district.

Alabama counties are nested within congressional districts, although there are several counties that overlap with two or three congressional districts (map here). As a result, we should expect that congressional districts are non-randomly distributed within counties. The a priori expectation would be that most counties should only have respondents in one congressional district. Additionally, we should expect respondents to correctly match their county and congressional district – there should be no ambiguity with exception of the possibility of minimal respondent error.

In the figure below (Figure 2 in the paper), I graph the joint frequency distribution of respondents by their counties and congressional districts for both the internet sample and the IVR phone sample. The rows of the graph correspond to county of residence while the columns correspond to the respondents’ specified congressional districts. The dark blue boxes represent clusters of one or more respondents, while the light grey boxes represent no respondents. Importantly, the red x-marks indicate valid responses that correctly match counties to congressional districts; all other cells in the heat map represent illogical and invalid county-district pairs.

Seltzer_fig2

Heat Map Depicting Joint Distribution of Counties of Residence and Congressional Districts for Respondents in the Internet and IVR Samples.

Notes: Correct Match refers to valid/logical matches for counties and congressional districts. All other cells represent invalid/illogical county-district pairs. Blue cells refer to whether one or more respondents indicated that they lived in the corresponding county and congressional district.

While the IVR phone sample matches our a priori expectations for how congressional districts should be distributed within counties, the internet sample does not. In fact, 117 out of the 324 internet respondents (36.1%) were unable to accurately match their county of residence to their US congressional district.

In Autauga county, for instance, which is in central Alabama and District #2, none of the respondents from the internet sample selected District #2. Instead, they indicated that their congressional district was either District #1, District #3, District #4, or District #7, all of which are incorrect.

It is unclear why respondents in the internet sample failed to correctly match their congressional districts to their county of residences. In the online questionnaire, respondents were provided a map that transposed congressional districts over county boundaries, and were then asked to indicate their congressional district. This should have been a simple task for respondents if they had knowledge of where they lived within their state of residency. To be sure, it is possible that the divergence in the joint distributions shown in the internet and IVR phone samples might have a practical explanation that is not easily inferred from the publicly-released survey methodology. But when internet error rate surpasses a third of all respondents, such an explanation seems implausible.

Less Trust, Moore Verification

Third-party internet panel vendors provide a cost-effective and time-efficient option for conducting survey research. However, data vendors often have aims and motives that do not align with academic researchers. By default, researchers should be skeptical of the accuracy of data provided by third parties. Ultimately, it is the researcher’s responsibility to determine the fidelity of the data they use in their analysis.

Although the aim of the paper is not to predict the outcome of an electoral contest, the removal of this poll from aggregate polling averages might indicate a tighter Alabama senate race than previously understood. Emerson College Polling released an additional poll that surveyed support for Roy Moore and Doug Jones in the Alabama senate race on November 28 that similarly relied on respondents acquired through Opinion Access Corp. If the same irregularities observed in the November 13 poll are present in the more recent poll, then political observers should interpret the results with the understanding that a substantial number of respondents interviewed might be invalidly included.

Nathan Seltzer is a PhD student in Sociology at the University of Wisconsin-Madison and a trainee at the Center for Demography and Ecology.

are adjuncts asked to write too many reference letters?

A Twitter exchange in response to my post saying that mediocre students deserve reference letters raised the problem of adjuncts’ reference-writing woes. Some adjuncts apparently get asked to write a lot more letters of reference than many full professors.  Some of the people who are being asked to write a lot of letters are contingent faculty who are already being overworked for poverty wages and it seems particularly unjust for them to be expected to shoulder this burden. My Twitter exchange was with an adjunct who teaches in five different departments and has a post doc besides, so I’m going to assume that the wage per course for this person is low. There are, of course, other adjuncts who are in regular non-contingent positions for reasonably good wages whose situation is somewhat different.

Writing a reference letter for an undergraduate takes at least 3 hours. Continue reading “are adjuncts asked to write too many reference letters?”

do B-average undergrads deserve letters of reference?

Once again there are discussions  about writing letters of reference in my social media. Some people seem to believe that getting a letter of reference is a privilege that only the very best students deserve, and that instructors ought to put a cap on how many students they will write letters for. Some of the arguments are based on managing instructors’ workloads. Coming from the pro-student side, there are also people who argue that letters of reference should  always be excellent letters that can really help a student’s career, which would seem to imply that letter-writers should decline to write at all if their letter would be merely tepid. (See below for samples.) This latter discourse also seems to imply that all students are excellent, or at least deserve to be written about as if they are excellent. So it is a real question: Do undergraduates who have failed to form close relations with faculty deserve letters of reference? Do mediocre undergraduates deserve letters of reference? My answer to both is, yes.  Continue reading “do B-average undergrads deserve letters of reference?”

an object lesson in good interviewing and public health

I have recently experienced an example of a persistent and rigorous interview that yielded an unexpected payoff. I somehow contracted a Giardia infection, a parasite usually associated with contaminated water. The first questions anyone knowledgeable asks are “Were you drinking out of streams?” and “Were you drinking well water?” because there is a problem in rural areas with contaminated water and wells can become contaminated, especially when there is a lot of flooding, as there was this summer. But I am a city person and only drink tap water. No, I haven’t been camping, I have not been drinking out of streams. I thought maybe there was a sick food service worker? Maybe contaminated tap water in a rural gas station in our trip to Duluth? Hard to know.

It turns out that Giardia is a reportable public health infection, so I got a call this week from a public health student. Continue reading “an object lesson in good interviewing and public health”

exercising judgment in teaching about controversial issues

My department has run a number of workshops (organized by grad students) on “teaching about race.” They asked me to speak about what the rules are about what we can and cannot say in the classroom. I was pretty sure I knew the “rules” but asked our Provost for the official statement. Interestingly, there was none, but the question was referred to the Legal department. After a delay, Legal Affairs sent back an email citing Wisconsin state statutes and linking to some policy statements. I’ve pasted the original correspondence below.* First a student and I translated the legalese into English bullet points. Then I wrote an essay about how to think about the authority and ethical responsibility in teaching controversial topics. This was recirculated this fall and as I’ve gotten positive feedback about this, I decided to post it here, with a few more edits, in case it is helpful. There’s always more to say, and legitimate disagreement about how to handle some things. Feel free to use the comments to expand on these points. Continue reading “exercising judgment in teaching about controversial issues”

bad behavior, secrecy, duplication, science

I’ve been mulling over Phil Cohen’s insistence that there is a serious problem with reviewer malfeasance, specifically in using the anonymity of peer review to prevent the publication of work that would encroach on the reviewer’s “turf.”  My first response was that Phil must be exaggerating, or in a bad field. But then I remembered that I do know for sure of one case in which someone I know did everything they could to block funding for and the success of a project with a similar research idea and methodology–but different data–to their own. And then, as I think about it, I know of another pocket of cases in which senior people published their own work on several related variations of topic X even though they had commented on and thus knew of prior working papers by graduate students on that topic and did not even have the grace to cite those prior papers, much less cede the turf to the people who had originated the research ideas. Is this kind of behavior as common as Phil seems to think it is? And, if so, what ought we to do about it? Continue reading “bad behavior, secrecy, duplication, science”

on sharing work in progress and anonymity

I got involved in a debate over at orgtheory about the pluses and minuses of putting working papers on line at SocArXiv (or elsewhere). That debate was tangled up with a variety of issues around the proposal to require public posting of papers that win (or are submitted to) section paper award competitions.

In this post I want to avoid that tangle of other issues and open discussion/debate on the narrower question of whether the discipline of sociology as a field should do all it can to move toward the model of other fields, where working papers are routinely placed on public archives before they go through peer review for ultimate publication.

The sociology model as it is generally practiced involves writing a paper, presenting it at conferences and circulating drafts of it around for a year or more, submitting it to a journal, going through several iterations of rejections and R&Rs, and finally getting it published maybe 4 or 5 years after it the work was originally done. In the meantime, some people (those you were at conferences with or to whom you sent the paper) know about the work, while others working in the same area may not know about it and thus will not cite it or be influenced by it, junior scholars worry that their work will be scooped by a more senior person who gets the idea from a circulating PDF or as an anonymous reviewer, and knowledge as a whole bogs down.

The alternative model practiced in many fields is: (1) Do the work and present it at conferences as the work evolves.  Be known as the person/team working on problem X because you have talked about it at multiple conferences. (2) Post a working paper on ArXiv or SSRN etc. as soon as you think  you have something to report. (3) Other people cite and debate your work based on the ArXiv or SSRN etc version. If it is wrong it gets called out and fixed. If it is novel and correct, you get invited to more conferences to discuss it and you learn about the work others are doing in the same field. (4) Your paper slogs its way through peer review and ultimately gets published; then you link to the published version from the working paper site.  Continue reading “on sharing work in progress and anonymity”