experimental vs. statistical replication

In the context of all of the debates about replication going on across the blogs, it might be useful to introduce a distinction: experimental vs. statistical replication.* Experimental replication is the more obvious kind: can we run a new experiment using the same methods and produce a substantially similar result? Statistical replication, on the other hand, asks, can we take the exact same data, run the same or similar statistical models, and reproduce the reported results? In other words, experimental replication is about generalizability, while statistical replication is about data manipulation and model specification.

On the one hand, sociology, economics, and political science all have ongoing issues with statistical replication. The big Reinhart and Rogoff controversy was the result of an attempt to replicate a statistical finding that revealed some unreported shenanigans in how cases were weighted, and that some cases were simply dropped through error. Gary King’s work on improving replication in political science aims at making this kind of replication easier, and even turning it into a standard part of the graduate curriculum. Similarly, I believe the UMass paper (failing to) replicate Reinhart and Rogoff emerged out of a econometrics class assignment (e.g.) that required students to statistically replicate a published finding.

On the other hand, psychology seems to have a big problem with experimental replication. Here the concerns are less about model specification (as the models are often simple, bivariate relationships) or data coding, but rather about implausibly large effects and “the file drawer problem” where published results are biased towards significance (which in turn makes replications much more likely to produce null findings).

Both of these kinds of replication are clearly important, but they present somewhat different issues. For example, Mitchell’s concern that replication will be incompetently performed and thus produce null findings when real effects exist makes less sense in the context of statistical replication where the choices made by the replicator can be reported transparently, and the data are shared by all researchers. So, as an attempt at an intervention, I propose we try to make clear when we’re talking about experimental replication vs. statistical replication, or if we really mean both. Perhaps we might even call the second kind of replication something else like “statistical reproduction”** in order to highlight that the attempt to reproduce the findings are not based on new data.

What do you all think?

* H/T Sasha Killewald for a conversation about different kinds of replication that sparked this post.
** Think “artistic reproduction” – can I repaint the same painting? Can I re-run the same models and data and produce the same results?

do we think one head is better than two?

Robb Willer sent me a link to this study “When Multiple Creators Are Worse Than One: The Bias Toward Single Authors in the Evaluation of Art.” It presents a series of experiments suggesting that people have a lower evaluation of artwork if it is presented as a collaborative effort rather than as a work of a single artist.

Of course this gets one thinking about the strong premium that is placed in some quarters of sociology on sole-authored work. Granted, this usually comes up in the context of individual evaluation, with the argument that it is hard to determine what the contribution of one person is on a multiple-authored work. But, can it have consequences for the evaluation of the work itself? Given that the findings of the experiment are about art, one possibility is that bias varies along the humanities science spectrum in sociology, where there’s bias toward single-authored work in humanities-oriented sociology and perhaps even against it in science-oriented sociology.

ritzer on ritzer on ritzer

George Ritzer, Editor in Chief, The Wiley-Blackwell Encyclopedia of Sociology, 2nd Edition, in an email to me:

We would like to invite you to contribute to Wiley-Blackwell’s Encyclopedia of Sociology, Second Edition, under the general editorship of George Ritzer… The Author will be entitled to receive access to the electronic (online) version of the encyclopedia for a period of two years…  In addition, the Author will have the right to purchase the entire set of volumes of the current print edition of the Work for personal use at a discount of 25% from the published price, copies of any work published by the publishers and currently in print, provided that all such purchases, including purchases of the Work, are paid for in advance by the Author.

George Ritzer, social theorist:

For example, when you write product reviews for Amazon.com you are enhancing the value of that site and the company; you are working for them and you are not being paid for that work…To put it baldly, the value of these computer-based businesses is based largely on the “work”- those clicks and likes- that you do for them free of charge. In a capitalist world you ought to be paid by all of them, but of course you are not paid. From the perspective of the critics of capitalism, you are being exploited by firms such as Google and Facebook (Fuchs, 2013). In fact, you are being exploited more than the paid workers in the capitalist system. Most of them are being paid relatively little, but you are paid nothing at all. Low paid work often yields great profits, but work that is unpaid leads to an even higher rate of profit.

I asked George Ritzer about this tension. He wrote:

As you know, this is a high compliment- using my ideas.. even if only to critique me. Your point is well-taken, but I am one of the exploited low-paid workers in the quotation (on a per hour basis for the number of hours it takes to edit a 2500-entry encyclopedia…far less than the minimum wage). I am also a prosumer in this case “consuming” the entries and “producing” edits, comments, etc. If there is an exploiter here, it is Wiley-Blackwell, but this is endemic to academic publishing. When we submit articles to journals owned by them (and SAGE, etc) we are the prosumers of those articles (and others), we are paid nothing, and they are profitable companies in large part because of the free work done by authors. There’s a broader critique here.

By my calculation Wiley, an academic publisher, has earned about a billion dollars in profit since the first edition of the Encyclopedia of Sociology came out.

life gamification project: goals brunch

The premise of our life gamification project is to provide lots of microincentives to do more of the small things we’d like to do more often. For it to work, we need to revisit these incentives regularly and think about what we’re trying to accomplish, what’s working, what we’d like to change.

So we invented an institution: Goals Brunch. Every Sunday morning, we go to a restaurant and think about our goals. We bring along notebooks and scribble reflections while we eat.

Goals Brunch has three phases: Continue reading

follow your passion!

My reading kick about elite sport has involved a serious sub-kick on women’s skeleton (it’s like luge, only headfirst). Great Britain has won the gold medal in women’s skeleton the last two Olympics; both times it was the only gold medal the UK won. 2014’s winner was Lizzy Yarnold, a 25-year-old who’d only been in the sport 5 years. Her story:

Yarnold’s sporting path to glory was changed forever when she attended a UK Sport Girls4Gold initiative, where highly competitive sportswomen were chosen for specific sports if they showed the attributes to become a potential Olympic champion.

Yarnold was a promising athlete at school in Kent, excelling at heptathlon, and enjoyed horse riding and diving. But she admitted to BBC Sport: “At the Girls4Gold selection, I desperately wanted to be picked for modern pentathlon. But they said I’d be more suited to skeleton instead. I must admit I’d never heard of it but I’ve never looked back since.

science as phoenix of global humiliation

Since we’re 13 hours ahead of Rio and I’m a sabbatical slugabed, I’ve watched very little of the World Cup. But of course the big story has been the humiliation of Brazil.

With 200 million people, Brazil is the world’s most populous country for which soccer is the national passion. They gave up 7 goals to Germany, and lost today 3-0 to Holland. Holland has 14 million people, so from population alone, the expected number of Dutch among the 11 best players on the field against Brazil is less than one.

What next? No idea, but I mentioned before that I’ve been thinking about elite sport as a model phenomenon for biological-social interdependence, and that this had me reading about Australian sport. Australia has a nifty parallel to the Brazilian case. Continue reading

sociology’s sacred project

After reading Philip Cohen’s thorough and entirely apt review of Chris Smith’s new book, I did what any self-respecting academic would do. I bought the book and read it.

I’m not going to offer a thorough review here; Philip’s is, characteristically, at once substantive and devastatingly accurate. In the main, it’s a profoundly silly book by an author who has the intellectual chops, professional history, and resources to do a much, much better job. The evidentiary base is irresponsibly haphazard, interpreted disingenuously, and in several cases factually inaccurate. And the pages are filled mostly with score-settling, as if Smith has spent his illustrious career keeping an enemies list of those who have insulted him and his friends and has committed to publishing it here. There are numerous basic editing mistakes (authors’ names misspelled, idioms incorrect, verbs forgotten). In short, it reads like an extended, incoherent blog post: a particular irony since Smith spends a considerable amount of space fretting that blogging has been bad for sociology, based mostly on Sherkat‘s admittedly obnoxious style.

Rather than a review, though, I want to ask whether there is a nugget or two of interest to be extracted from the book.

Continue reading

our week as walter white

katja
“Pick a country in Eastern Europe.” “Ukraine?” “Pick a hair color.” “…Red? Why?”

For a podcast series, I was recently asked to do a dramatic reading of Violet, the text adventure game I wrote as my Secret Hobby Project of 2008. Maybe I’ll talk about that later, but suffice it to say that text adventures (much less dramatic readings thereof) are niche interests that I have no illusions correspond to the niche interests of this blog.

However: the opening 3:20 of the podcast is me recounting an altogether insane text-ing adventure that happened the last week of my stint in Paris this spring, and if curious, click here. (Also a must-click for any adenoidal-voice fetishists.)

a study in scarlet

RedShirtOvulationGraph
(N=100 on the left, N=24 on the right, one data point per person, observational study)

Andrew Gelman and I exchanged e-mails awhile back after I made his lexicon for a second time. That prompted me to check out his article in Slate about a study published in Psychological Science finding women were more likely to wear red/pink when at “high conception risk,” and then I read the original article.

I don’t want to get into Gelman’s critique, although notably it included whether the authors were correct to measure “high conception risk” as 6-14 days after a woman starts menstruating (see Gelman’s response to authors response about this). And I’m not here to offer an additional critique of my own.

I’m just looking at the graph and marveling at the reported effect size, and inviting you to do the same. Of the women wearing red in this study, 3 out 4 were at high conception risk. Of the women not wearing red, only 2 out of 5 were.*

UPDATE: Self-indulgent even by blog standards, but since I could see using this example again somewhere and it took some effort to reconstruct, I’m going to paste in the cross-tab here: Continue reading

life gamification project: the ten o’clock tally

With our Life Gamification Project, the goal is 100 points a day. But what’s a day?

The obvious answer might be that a day starts when you wake up and ends when you fall asleep. Doesn’t work. Continue reading

mastering sociology

What purpose does a terminal MA in sociology serve and what purpose should a terminal MA in sociology serve?

These questions have come to my mind after spending three years in a department with a terminal masters program (and no Ph.D. program). To partially answer the first question, I can say that there seem to be three kinds of students who enter our program:

  1. Students who want to pursue a Ph.D. program and either don’t have the credentials to be accepted currently or don’t feel like they have the credentials to be accepted;
  2. Students who receive some kind of promotion or pay for holding a Master’s degree; and
  3. Students how liked undergrad and want to continue in school with a vague idea that they want to do non-academic research (e.g., in think-tanks) who might also be, possibly, maybe-in-the-future, considering a Ph.D. program.

For the first group, the terminal MA seems to serve a defined goal. The information to make the decision is "knowable," though I am not sure how many follow solid advice not to enter Ph.D. programs. The second group probably makes the most sense since everything is on the table. The benefits seem clear for those in the second category since one could evaluate what one would pay in tuition or student loans against the expected future return in increased salary.

It is the third group which concerns me.

Continue reading

the bigfoot-black swan continuum of behavioral science

Jason Mitchell uses the example of “black swans” to argue that there is a fundamental asymmetry between positive and negative findings in psychology experiments, such that positive findings are the only meaningful findings and negative findings should not be published. The idea is that no matter how many white swans you observe, you don’t know if black swans exist; whereas if you observe one black swan, you know they do (full quote at bottom).

The problem: findings from behavioral science experiments aren’t like being able to hold a black swan by the neck and shout to everyone, “See! I told you they existed!”

Instead, you are presented with papers in which you have to trust researcher reports of what they did to produce a finding that an observed swan was darker than would be expected under the white-swan null (p < .05).

In this respect, positive experimental findings are somewhere on a continuum between Bigfoot and black swans. Continue reading

why so much psychology?

(Zeroth in a series) I’ve been interested in the sociology of psychology ever since my dissertation, but the recent dramas in social psychology have made this interest, like Tinder at the Olympic Village, “next level.” (Also, I’ve a genuinely remarkable advisee, David Peterson*, whose dissertation involves a multisite lab ethnography of psychology, and even though we’ve got nine thousand miles between us we’ve been corresponding on this issues quite a bit.)

I’m just explaining here what’s going on if you ever wonder, “Why does Jeremy talk so much about psychology?” Also, I worry that a lot of my concern about psychology appears like it’s strictly methodological, but a lot of the methodological critique adds up to a dire substantive point that I think sociologists should be extreme concerned about-but that’s a teaser for another post.

For now, let me link to one of the latest turns in the drama: a post by a Harvard psychologist arguing strongly against the value of replication at all, by as far as I can tell unwittingly following Harry Collins’s experimenter’s regress all the way to a sort of anti-replicationist fundamentalism. Continue reading

life gamification project: how we earn points

(Second in a series.) I said my beloved and I are having a good run with a system where every day our goal is to score 100 points. How do we get points?

Our core idea: instead of making any big resolutions, we provide small incentives for doing things that we’d like to do more often. Right now, I have 67 different ways that I can earn points. More, in fact, once you break everything down. I am not making this up.

Here are some of the non-work related things I can get points for: Continue reading

the lgbt movement did not, it turns out, tone it down

“Groups Debate Slower Strategy on Gay Rights” was the title of this 2004 NY Times article that I just discovered in my file drawer.* In which the author describes a bedraggled and frustrated LGBT movement just weeks after George W. Bush had been elected to his second term.

In the past week alone, the Human Rights Campaign, the nation’s largest gay and lesbian advocacy group, has accepted the resignation of its executive director, appointed its first non-gay board co-chairman and adopted a new, more moderate strategy, with less emphasis on legalizing same-sex marriages and more on strengthening personal relationships…

One official said the group would consider supporting President Bush’s efforts to privatize Social Security partly in exchange for the right of gay partners to receive benefits under the program.

While the article quotes two academics, George Chauncey and Jonathan D. Katz , who disagreed with a sharp, “They are, of course, completely wrong,” the most interesting tiff is among politicians: Continue reading

Follow

Get every new post delivered to your Inbox.

Join 612 other followers