The following is a guest post by Juan Pablo Pardo-Guerra.
Here we are, yet again, discussing a paper that, through computational means, resurrects painful ghosts from the past.
Here we are, yet again, discussing the myriad problems of design, inference, and logic behind Michal Kosinski’s work.
Here we are, yet again, trying to painstakingly explain why physiognomy is an intellectual dead end, even when powered by the engines of modern computation.
Here we are. Again.
Michal Kosinski’s paper (“Facial recognition technology can expose political orientation from nationalistic facial images”, Scientific Reports 2021) is a tour de force on how not do computational social science (or any science for that matter). The argument largely mirrors his previous and equally polemic work: facial features, extracted computationally from photos posted on social media (which are somehow ‘naturalistic’), allow predicting complex human traits from sexual to political orientations. I won’t go into the weeds of Kosinski’s paper, as other more competent analysts (including Phil Cohen, Greggor Mattson, Shreeharsh Kelkar, and Andrew Gelman) have done this in the past and their critiques hold true for this new article. Suffice to say, he lacks an understanding of the ‘data generation’ process behind what he presents as ‘naturalistic’ images posted on Facebook; he reduces political orientation to a ‘universalistic’ machine readable binary that is offensive to anyone who has read anything about, well, politics; he assumes his pre-processing is only picking up physical features of faces rather than behaviors encoded into the images; he assumes he understands the output of the off-the-shelf VGGFace2; he selects scores without justification. It is an ungodly mess.
Let’s be absolutely clear: Kosinski’s research is certainly flawed, but this is not the main problem with his paper. What I think matters most and which is different this time around is the fact that Kosinski’s paper stands as an avatar for an emerging genre of research that mixes an abundance of computational techniques with the rise of fast turnaround, pay-to-play journals to make claims about concepts that are central to established social sciences. Often written by ‘hard’ scientists (think of the equally troubling paper by Safra et. al. “Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings”, Nature Communications), these papers are akin to a form of ‘computational imperialism’, where existing disciplinary expertise are ignored in favor of seemingly impressive research designs.
Bad research is unavoidable in science. Indeed, some bad research may even be a necessary condition of science. Yet what we see in Kosinski’s paper and others like it is patently different. It is not just bad science. Rather, it is a science-like claim designed to be controversial (I mean, it’s basically phrenology, for god’s sake), published in a journal that evokes quality but doesn’t guarantee it (Scientific Reports, managed by Nature Research), and that has avoided the detailed forms of review that we associate with more established disciplinary publications (submitted in September 2020, the paper was accepted in a staggering three months). In a way, Kosinski’s work fits these platforms in the same way that clickbait fits social media: it is research that will create clicks, citations, and engagements for these online, generalist journals, justifying their pay-to-play model and its attendant onerous fees (less than two weeks after its publication, the paper already has more than 49,000 views). In a very practical way, Kosinski’s work is to science what fake news is to the public sphere: a calculated intervention that uses the attention economies of academic publications to create and profit from visibility rather than knowledge. An economy of attention that speaks to our profession’s economy of credit and prestige.
The problem is thus not bad science primarily, but its amplification through platforms that make it visible and apparently legitimate. This is not entirely the same phenomenon as, for example, the various controversial pre-prints about coronavirus prevalence from 2020, where the institutional status of their authors did much of the work (though Kosinski’s affiliation certainly doesn’t hurt). The difference here is that this is ostensibly authenticated, reputable, peer-reviewed research. While platforms for more responsive and accessible research are certainly a much needed social good, for-profit publishers have taken advantage of this niche, investing their established reputational capital into new ventures that condition access to financial resources. Nature Research has certainly profited from this glut for fast-tracked open access publications (a byproduct, perhaps, of our hyper-quantified, productivity-driven profession). Publishing in Scientific Reports, home to Kosinski’s most recent study, costs $1,990 while publishing in Nature Communications will set you back a whopping $5,560. (To give you a better sense of this business model, just on January 19, Scientific Reports released about 49 papers amounting to around $97,000 in income from publication fees; The economy of attention comes at a hefty price).
These platforms are ill-fitted for evaluating the type of research that Safra, Kosinski, and other computationally minded authors claim to have produced as contributions to actual knowledge. Presenting claims squarely about social categories and behaviors (trustworthiness, social organization, orientations, identities), one would expect these journals to have at least a few relevant experts in their ranks. Yet what is visible from Nature Research is that they have not a single social scientist sitting on the editorial board of Scientific Reports or Nature Communications. Worse: they even lack data and computer scientists who might be able to provide a more qualified review of these papers, their assumptions, and their methods. Ophthalmologists, geneticists, plant biologists, and oncologists are experts in their own right, but they have no real capacity for evaluating research about political orientation or historical changes in societal trust.
This failure to properly evaluate submissions that are substantively social science is equivalent to the failure of platforms to moderate troubling content in the name of a free and open public sphere. This is particularly clear in Kosinski’s paper. Both in the published version and semi-private conversations (OK, a squabble on Twitter), Kosinski has presented this article as proof of the dangers of facial recognition technologies. The argument, he notes, is not about physiognomy but about privacy (“What if Putin used this!” as if Putin didn’t have much cheaper methods of control). I am sure this was appealing for whoever reviewed the paper: it must have seemed a reasonable, socially conscientious argument, a cry in defense of liberty and democracy. But even the most naively trained social scientists would see through this, being able to recognize that, in its design and argumentation, Kosinski’s was a physiognomic claim at the core. Saying that it is actually about privacy and the threat of authoritarianism is almost a form of gaslighting, amplified and legitimated by the platform on which it travels. (You don’t need to shoot someone to prove that guns are a risk, but it is certainly a flashy demonstration.)
So here we are again, but this time with a twist: Nature Research is $1990 richer, while everyone else is unfortunately living in a poorer world. This is what is actually at stake: the standing of our craft, our expertise, our discipline, and our legitimacy in the sprawling new economy of clicks, metrics, and improperly used computational wonders.