I am going to take Dan’s invitation to consider one aspect of the polls that I don’t see getting a lot of attention right now, but that I think could be important: undecided voters could explain much of the polling error being discussed.
In other words, I don’t think that the polls were that wrong. I know that this view puts me in the minority, even among people who think about these things for a living. What we have, I think, is a failure to really consider how we should interpret polls given two very unpopular candidates and a possible “Shy Tory” effect where Trump supporters reported being undecided to pollsters.
O’Neil looks out at the land of big data and its various uses in algorithms and sees problems everywhere. Quantitative and statistical principles are badly abused in the service of “finding value” in systems, whether this be through firing bad teachers, targeting predatory loans, reducing the risk of employee turnover by using models that incorporate past mental health issues, or designing better ads to sniff out for-profit university matriculates. Wherever we look, she shows, we can find mathematical models used to eke out gains for their creators. Those gains destroy the lives of those affected by algorithms that they sometimes don’t even know exist.
Unlike treatises that declare algorithms universally bad or always good, O’Neil asks three questions to determine whether we should classify a model as a “weapon of math destruction”:
Is the model opaque?
Is it unfair? Does it damage or destroy lives?
Can it scale?
These questions actually eliminate the math entirely. By doing so, O’Neil makes it possible to study WMDs by their characteristics not their content. One need not know anything about the internal workings of the model at all to attempt to answer these three empirical questions. More than any other contribution that O’Neil makes, defining the opacity-damage-scalability schema to identify WMDs as social facts makes the book valuable.
Last weekend, Slate announced the use of social scientific tools similar to those used by campaigns themselves to anticipate results over the course of the day. Slate rejects, in editor-in-chief Julia Turner’s words, the “paternalistic” stance of the traditional media embargo on publishing results during Election Day.
Slate is making a bold move by ignoring the embargo, but in doing so they also appear to be ignoring the flaws of data science and a sacrosanct principle of both social science and journalism: skepticism.
Timothy Carney wrote an article earlier this week decrying what he calls the “rampant abuse of data” by pollsters and the press this election season. He faults North Carolina’s hometown polling company, Public Policy Polling (PPP), among others, for asking “dumb polling questions” such as the popularity of the erstwhile Cincinnati Zoo gorilla Harambe; support for the Emancipation Proclamation; and support for bombing Agrabah, the fictional country in which the Disney film Aladdin is set.
While I agree with Carney that many of the interpretations of these questions are very problematic (and I should note that I have used PPP many times to field polls for my own research), I think he’s wrong that these are dumb questions and that the answers therefore do not constitute “data.” Quite the opposite: asking vague and difficult-to-answer questions is an important technique for assaying culture and, thereby, revealing contours of public opinion that cannot be observed using conventional polling.
A couple of weeks ago I got in a friendly back-and-forth on Twitter with my friend and colleague Daniel Kreiss. Daniel was annoyed by this article, which purports to reveal why Mitt Romney chose Paul Ryan to be his running mate by deploying median-voter theory. Daniel’s frustration was this:
I love these studies – complicated models, and no one thought to ask former staffers what went into the decision. https://t.co/t99mfXhyUl
As I have admitted before, I am a terrible electronic file-keeper. If I was to count up the minutes I have wasted in the last 15 years searching for files that should have been easy to find or typing and retyping Stata code that would have (and should have) been a simple do-file or doing web searches for things that I read that I thought I wanted to include in lectures or powerpoints or articles but couldn’t place, I fear I would discover many months of my life wasted as a result of my organizational ineptitude.
For a long while, these bad habits only affected me (and the occasional collaborator). It was my wasted time and effort. Now, though, expectations are changing and this type of disorganization can make or break a career. I think about my dissertation data and related files, strewn about floppy disks and disparate folders, and I feel both shame and fear. Continue reading “ask a scatterbrain: managing workflow.”