transparency as a matter of research ethics

Apparently APSA has actually made a step toward open science as part of their ethical guidelines. According to a recent paper in Science:

The American Political Science Association in 2012 adopted guidelines that made it an ethical obligation for researchers to “facilitate the evaluation of their evidence-based knowledge claims through data access, production transparency, and analytic transparency.”

My understanding is that the ASA ethical guidelines are still stuck on the idea that people should get around to sharing data after they’ve finished all the articles they might want to write from a paper, which as a practical matter often means “never.”

Any plans afoot to modify the ASA code of ethics?

Author: jeremy

I am the Ethel and John Lindgren Professor of Sociology and a Faculty Fellow in the Institute for Policy Research at Northwestern University.

9 thoughts on “transparency as a matter of research ethics”

  1. I just got this announcement as part of a monthly section update:

    “A New Committee to Consider Revisions to the ASA Code of Ethics

    At the annual meeting in San Francisco, the Executive Officer, Sally Hillsman, met with the Committee on Professional Ethics (COPE) and suggested that it was time to revisit the Code of Ethics. It has been more than fifteen years since any revisions have been made to the Code, and a great deal of change has taken place. At the very least, social, regulatory, and technological advances have had striking impacts on the field. Moreover, it is anticipated that the federal Department of Health and Human Services will soon announce changes to The Common Rule, which governs the vast majority of human subjects research efforts.

    To that end, an ASA committee has been formed to consider possible revisions. All members are either current or former members of COPE. It is chaired by Tom Van Valey, and the other members are Earl Babbie, Guillermina Jasso, John Kennedy, and Roberta Lessor.

    The committee is completely open to considering any topic or issue that might affect the Code. In addition, we welcome suggestions of other members who could serve as resources with regard to specific issues. To this end, the ASA has set up an email account for members and others to communicate with the committee. It is If your organization has a concern, a suggestion of a topic or a resource person, or even if someone wants to volunteer to help, simply send a message to that address. You will be contacted by a committee member.

    The ASA’s Code of Ethics

    Starting in 1971, the ASA has had a working Code of Ethics approved by its membership. Since then, the ASA Code has been revised several times, most recently in 1997. Both the Code and the procedures for COPE – the ASA Committee on Professional Ethics (which is responsible for enforcing the Code) – can be found on the ASA website at

    I don’t know if the current plans address data sharing, but at least there’s a current venue to raise the issue.


  2. Some political science journals, such as the American Journal of Political Science, require data for an article to be uploaded to a public website such as the Dataverse, so that researchers interested in the reproduction data do not need to hope that researchers abide by ethical guidelines.

    My experience and the experience of other researchers that I have talked to is that a nontrivial number of political scientists still ignore or do not comply with requests for data and code required to reproduce their research. However, if I want to reproduce a recent article from the American Journal of Political Science, I can download the data and code from the Dataverse.

    The Dataverse policy isn’t perfect, because the data are not always the raw data; if a researcher tests results with all possible combinations of fifteen different control variables and then reports results with only six controls, the dataset sometimes only has the six reported controls; other times, raw variables are replaced with summary scales, so there’s no way to check whether the scales were constructed correctly; but the Dataverse policy works well overall and is better than no policy.

    It’s possible that changing the ASA guidelines will foster open science, but maybe only at the margins: the main reason for data sharing is not to catch errors in the analyses of persons ambivalent about data sharing; the main reason for data sharing, at least as I understand it, is to catch and prevent p-hacking and selective reporting of results and other questionable research practices. I suspect that the sort of researchers who skirt the ethical line in data analysis aren’t going to be moved by ethical guidelines regarding data sharing, unless the ethical guidelines have a strong enforcement mechanism. I think it would be better for open science to not rely on researchers following ethical guidelines and instead to require data to be posted as a condition of publication, with temporary data embargoes for multi-article publications, if necessary.


  3. I am curious about your thoughts, Jeremy, about the problem of creating data. What incentive does a researcher have to spend 5+ years writing proposals, responding to review committees, negotiating contracts, cleaning data, coding analysis variables, and writing codebooks when they have to release all of that data with a single paper? It seems like it would create perverse incentives for researchers to find out what someone is working on, wait until they publish a paper and then demand the data and then publish their own analysis having to do none of that background work.

    This is an honest question that I have struggled with as I think about data access. It reminds me a little bit of Gallup’s legitimate complaint of survey aggregators like Nate Silver. Without the work that Gallup does, then it would be impossible for Nate Silver or Sam Wang or others to run their models.


    1. From the perspective of what would best advance the field, I think the best solution would be for data collection and preparation to be rewarded in a way that reflects their real contribution to science. That is, we should have ways to recognize the contribution of making such data available, not only the contribution of analyzing them. However, I don’t know exactly what that could look like, or how realistic it is.


    2. For the purposes of open science, I think it’s more about releasing the data that was used for a particular publication, rather than the whole of that dataset. I do think making whole data sets available earlier rather than later is something to be encouraged, and that we need a robust system of data citation in order to do that.


      1. It seems to me like there are multiple versions of open science:

        1. Share only the data and code required to perform a specific analysis
        2. Share data and code on any analysis that was run in the completion of a paper (per L.J.’s comment above)
        3. Share all data in order to verify results are robust

        I can see the case for (1) in most cases — and particularly for experiments, publicly available datasets, and other studies where the data collection is not particularly arduous. And, Jeremy, it seems like that is what you are advocating. But I have seen others make a case for (2) and (3), in which case I could imagine a whole bunch of people waiting for others to do the work and then running with the data (and initial code to do time-consuming things like recode categories into dummy variables and such).

        I agree in principle with Elizabeth’s comment above — but I don’t see how we practically get there, especially in an age when we are moving more towards quantifiable metrics of publication and impact scores. I don’t think that would be resolved with more robust data citation since it is a qualitatively different kind of endeavor. Particularly since funding agencies are cutting funding for data support and dissemination even as they are requiring P.I.s to do more of it (when NIH cuts budgets they don’t want to sacrifice “the science” but that means the hours that get cut are the documentation and dissemination).


      2. I’d put myself under (3) because I don’t know of any way to differentiate (2) from (3) other than trust in the researcher whose work is being checked, and the main reason that I support public posting of data and code is because I don’t have much trust in the first place, given the p-hacking and selective reporting that I have seen.

        I’m sympathetic to the idea that researchers should benefit from their data collection efforts, so I don’t expect a researcher to publicly post all the data from a 5-year project if that researcher has specific plans for multiple follow-up papers. But I think in that case that there is still value in a researcher publicly committing to publicly posting all the data and code at some reasonable point in the future because (1) that sort of promise of transparency provides more confidence that the researcher isn’t involved in p-hacking and selective reporting, given that others will be able to check his or her work, and (2) the intention of publicly posting data and code can only help to make the data analysis more representative, correct, and clear.

        In terms of personal careers, the balance of the incentives is against voluntarily releasing data and code: it takes time and effort to prepare the data and code for release, and there’s a greater chance that someone might find an error in the work; there’s not much on the other side of the ledger, except maybe that people in favor of open science will grant a few more reputation points or maybe you’ll get a few more citations.

        However, in terms of getting the science correct, the incentives are strongly in favor of releasing data and code because making data and code available helps keep the literature free from error and bias. Researchers who are interested in getting the science correct will find a way to permit other researchers to check their work for correctness, robustness, and representativeness; sometimes that involves making data and code available before publication, to prevent errors from entering the literature; sometimes that involves making data and code available upon publication; and sometimes that involves making data and code available after an embargo period, if there is a legitimate reason that the data should be kept private, such as a multi-paper project.

        Maybe the culture is different in political science, but it’s not uncommon for political science researchers voluntarily to post their data. For example, Lee Epstein co-authored a 2006 article on Supreme Court nominations, and she has posted updated versions of the dataset after new nominations or after a correction. As far as I can tell, devoting resources to updating the nominations dataset provides no personal benefit to her, except maybe increasing the chance that someone uses the dataset and cites her article. But, by updating the dataset, Epstein is providing a benefit to political science and to those of us who might be able to add to the discipline with an add-a-variable or split-the-sample type of analysis.


      3. The solution to this problem is property rights.

        If you invested a lot of time and effort, you deserve to exploit that data to the max. On the other hand, others need to be able to verify your work.

        To resolve this conflict, you can create licenses that specify what can be done with the data. For instance, you make your data set completely available, BUT it can only be used for replication purposes until an embargo period of X years or months are reached. After that embargo period is over, then others can use your data for their own papers.

        So who creates these licenses? Preferably, ASA, APA, and so on. Academia could learn a thing or two from open source software. There are numerous licenses for almost any situation and they largely work. Currently there is an institutional void with respect to property rights and data.


  4. There’s another slice through data sharing, in which PIs on federally funded projects that collect new data are expected to share their data faster than they can necessarily get their own publications done. This is not justified in terms of transparency of results but in terms of public funding implies an obligation to public sharing. There is a question about whether PIs whose analysis involves reworking publicly-available data have the same obligation to release their derived data sets on these grounds. Again, this is a “sharing” issue which is different from the matter of having your published work be checkable. And I think it is this sharing issue that bothers PIs the most, who feel that others get to free ride on their work, obviously in interaction with debates about who paid them to do it.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.