Both the academy generally and the social sciences specifically are rife with inequality. Black and Latine people are underrepresented among sociology PhDs and faculty; people who’s parents have PhDs are dramatically overrepresented; women are awarded less grant funding than men; and academia can be a hostile environment for LGB and especially trans scholars. Yet, despite considerable interest in these issues, it is remarkably difficult to study demographic inequality in critical parts of the academy like publishing for the simple reason that the necessary data either do not exist or cannot be linked. The NSF collects data on students and faculty. Professional associations collect data on members. But journals and publishers generally don’t collect and share data on authors. Christin Munsch and I are changing that.
Of course, others are working in this space as well. Chief among them is the IRIS team’s UMETRICS data, which links data on individual scholars from select universities with their institutions, mentors, funding, spending, publishing, promotion, and more. It is a fantastic data set! But, with only 26 universities included, what it can tell us about academic publishing across entire fields is limited.
To date, scholars interested in publication, review, and citation inequalities typically infer authors’ gender or race/ethnicity from their published names. I linked here to two particularly careful and insightful examples. Sometimes ascribed gender or race is what we want to measure. For example, if we believe an effect (like choosing to cite a person) operates through observers’ assumptions, then it makes sense to study what observers see as “Black names” or “feminine names.” For example, data on observer-ascribed gender can be very useful for studying trans lives as a measure of misgendering, as Danya Lagos has shown.
But imputing demographics from names to understand other aspects of identity is a … fraught process. For example, one of the most popular and accurate tools for ascribing gender to names misclassifies any woman in the US going by the names Alex, Chris, Sam, or Parker as “gender : male” with “probability : 0.98”. Moreover, because these tools are invariably limited to the labels “male,” “female,” and occasionally “unknown,” they inherently misgender people outside that binary. Such tools not only make it impossible to study trans and nonbinary academics’ publishing. The tools write them out of existence. As such, they have attracted numerous critiques. Similarly, research on race/ethnicity using name data is often limited to comparisons like “East Asian vs British origin names,” which elides considerable variation within and outside those categories. Other demographics cannot be studied with names at all: there are no tools for imputing sexuality or socioeconomic status from names, for example (unless you’re a Medici).
The best way to get inclusive and accurate demographic data is to ask. In the coming weeks, Christin Munsch and I will launch a quick, 2-minute survey asking the authors of every paper published in a sociology, economics, or communication journal in the last five years basic demographic questions about gender, sexuality, race/ethnicity, disability1, parents’ education, and career stage. Even though I’ve been doing bibliometrics research for a while, I find the scale staggering. In sociology journals alone, authors have listed over 28,000 unique email addresses. If the response rate is high, we’ll be able to provide valuable data to the social science community on equity in publishing that can support intersectional, feminist analyses.
Obviously, confidentiality is a central concern. We are asking you—our colleagues—to trust us with potentially sensitive information. As past chair of the Sociologists’ LGBTQ Caucus, I know the importance of being able to control disclosure of our identities in professional environments. ASA has demographic data for most of its members, but it rightly refuses to share that data. Similarly, our data collection efforts will also be confidential. We will not share identifiable data outside our study team, and within the team it will be handled with the same data security protocols as other sensitive data.
Even if you only answer a few of the questions and skip others, your answers are important for understanding our field. So please be on the lookout for our email with the survey in the coming weeks. We hope you will take 2 minutes to fill it out.
- Added at the suggestion of Molly King – a valuable correction to our initial oversight.