boy detective

So, the General Social Survey reinterviewed a large subset of 2006 respondents in 2008. They have released the data that combines into one file the respondents interviewed for the first time in 2008 and the 2008 reinterviews of the respondents originally interviewed in 2006. In a separate file, of course, you can get the original 2006 interviews for the latter people.

What has not yet been released, however, is the variable that would identify what row in the first file corresponds to what row in the second file. In other words, you know that person #438 in the reinterview data is somebody originally interviewed in 2006, but you don’t know what person in the 2006 data there are.

Well, especially because the last thing I need to be doing right now is procrastinating, that sounded like a challenge. Just as I have learned that just because there are no microwave instructions for a frozen dinner doesn’t mean you can’t microwave it, just because there isn’t a merge variables doesn’t mean you can’t merge the data. At least if no secure data agreement is involved.

All I have to say is: holy crap. You’d think knowing somebody’s sex, survey ballot (which was kept the same both times), zodiac sign, year of birth, self-identified race, region where they lived where they were 16, whether they lived with their parents when they were 16, whether they lived in the same place they did growing up, who they said they voted for in 2004, their marital status, their education, what they say they did for a living, how many years their mother went to school, inter alia, would allow you to pretty easily pinpoint who is who. I am here to tell you this is not the case.

I was able to devise some convoluted scheme and check how well it was doing thanks to a pretty big clue that I’ll refrain from posting, but even then there ended up being 50 cases that out of 1500 that I wasn’t sure who they were. Iin general the experience affirmed a fundamental suspicion I’ve had about analyzing survey data: the data seem so much less real once you ask the same person the same question twice.

Author: jeremy

I am the Ethel and John Lindgren Professor of Sociology and a Faculty Fellow in the Institute for Policy Research at Northwestern University.

14 thoughts on “boy detective”

  1. Ohhhh I know … tis why I love my new job evaluating survey questionnaires and building the case for a multi-method approach to all these delicious public health puzzles … got survey data? … proceed with caution.


  2. Fascinating. I guess we will learn a lot more about data reliability after the merged data is released. Meanwhile:
    1. I would love to know more details about your convoluted matching scheme and the results.
    2. I wonder how many apparent matches this scheme would generate with two samples from the GSS that do not include the same person interviewed twice.
    3. I wonder why NORC is delaying release of the merged data for so long.


  3. That GSS thing is a great matching problem. I have tried to match CPS respondents from one year to the next knowing lots about them and having the further advantage of looking only at people living at the *same address* in the previous year. And – well it works pretty well, actually. But there are some problems, like people who divorce and remarry someone of the same sex, age, race and education level as their previous spouse. The biggest problem is actually people who age more or less than a year. If you relax that you’re OK.

    On the coding errors in the same-sex marriages, a large portion of the couples were caused by errors they have identified. It’s not so much people getting their own gender “wrong,” it’s more missed marks, double marks, incomplete erasure, etc. In ambiguous cases the data entry people were instructed to mark the first option on the item in question. For sex that meant “male,” and for relationship that meant “husband/wife”. So it’s a tiny error rate among a huge population (straight married people), combined with a coding procedure that produced a consistent error. Now they do the reasonable thing and impute those cases. (The “hot deck” says married people are probably of other sexes.)

    Other problems had to do with questionnaire layout. They did this cool study where they tracked people eye movements while they filled out the form. Very scientific! It’s all here:


  4. This is going to sound snarkier than it should, but this is something ethnographers have long known. Not about survey research (we know nothing about that!); but about people. They are wildly inconsistent. I believe I pointed this out once before, that when reading my work one commentator said, “but this can’t be true — what these people are saying is inconsistent!” To which I replied, “No, and yes.”

    The ethnographic solution is to spend a lot of time with people. A LOT of time. On the assumption that through sustained interaction eventually patterns emerge (people’s tendencies, or their ways of doing things). I’m not sure what the interview solution is, or the survey one. Ask enough people and you also get tendencies? Perhaps there isn’t one.

    I’m not so arrogant as to think that just because there is a solution that means it is a good one. But I’m curious about this, more for interview folks than anyone else.


    1. Naturally, survey researchers have long known this, too – people make mistakes, change their minds, interpret things differently, etc. But “wildly inconsistent” is sort of a stretch for both the GSS and ACS cases. In my expert opinion most people are normal.

      You can get the wrong answer for two hours from someone who thinks they know what you want to hear. It’s just a different process from someone checking two boxes on the gender question, or writing in “German Shepherd” for race. I’m not sure how to know which method produces more bad data.

      I like this one, though: Great evidence interviewer perceptions of respondent race are affected by social status. Those who are “unemployed, incarcerated, or impoverished are more likely to be seen and identify as black and less likely to be seen and identify as white, regardless of how they were classified or identified previously [in the same survey].” (


      1. I guess my point is that they are not “making mistakes,” or “changing their minds.” They are being what they are: inconsistent. It may be a logical mistake (you can’t be both rich and poor). But it is the way in which they think in a situated moment. I’m not sure what “normal” means. (I don’t mean that in a cute way). I mean that I don’t know. But what I find is that the logical coherence that we demand of our explanations are not always operative in the day-to-day lives of people.


      2. Oh, and PS: the ethnographic solution of sustained interaction with people to observe patterns has lots of problems. One, it assumes a kind of static thinking (people are always the same, if I spend time with them, that sameness will emerge as a product of who they are, and not, say, a pattern of emergent interaction). Two, after a while we begin to fall upon a consistency for a person and ignore things that run against it (they are exceptional curiosities, or mistakes). But I am curious if the survey solution is to get enough people and the pattern emerges within what are random inconsistencies? And I am actually much more interested in small-N, one off methods (interviews) where neither of these solutions are available.


  5. So aside from the obviously changeable things (marital status, employment, residence), are there any patterns to the changed data?


  6. I like Shakha’s formulation that people are inconsistent and that the problem is therefore the instrument, not the people. Reminds me of the story in the Hitchhiker’s Guide to the Galaxy series where people are told their feet are the wrong size for the shoes. Or the elevator operator at an NYC hotel I stayed at once who assured me that there were plenty of elevators, but too many people who wanted to go up and down!

    On a more serious note, I’m working on final edits to the second of our two Adorno volumes on public opinion, this one to be titled Before the Public Sphere. How’s this for a quote:

    In a social scientific interview situation the interviewee faces the usually foreign interviewer, who reads him a number of questions and encourages him to reply immediately. It is obvious that, particularly outside of market research, this is not a realistic simulation of a conversation. Instead, it is a laboratory experiment in which the question functions as a stimulus. Yet, it is sometimes assumed that the reactions to these stimuli, the coded answers, have the same value as utterances made under conditions of reality. The dubiousness of this assumption and the source of error therein are examined in detail by Hofstätter, among others. We only want to point out that even when a participant is entirely willing to answer to the best of his knowledge and belief, the interview situation still influences the findings, because it requires definiteness even where it may not exist.

    (From Pollock, et al., _Gruppenexperiment: Ein Studienbericht_, Frankfurt: Europaische Verlagsanstalt 1955).

    There, I’ve done it – quoted the Hitchhiker’s Guide to the Galaxy and Adorno in the same post!


    1. Yes, the GSS will become a great resource for study of variables known to change little over time or assumed to be constant over time. The GSS core is being replicated in each wave of the panel survey, so we get multiple responses per individual over time for variables like sex and race. I am particularly pleased that the GSS panel will have multiple measures of religious participation over a short period of time. Many panel surveys in the U.S. do not measure religious participation very often. For example, the NLSY79 panel data measures religious attendance in 1979, 1982, and in 2000 (and the 2000 variable has problems–email me if interested). Anyway, this aspect of the GSS will be fun to explore when the data is merged. I suppose the next complexities will also make the GSS more daunting to first time users.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: