constructing quantitative data

I don’t envy those researchers who collect and analyze retrospective family history surveys. The data can be a real mess.

Mark Regnerus recently released the raw data from his New Family Structure Study (NFSS). I thought that many researchers would want to reanalyze the study, so I attempted to put together a Stata do file that replicates the original analyses that were published in Social Science Research. I’ve also posted my version of the full regression tables, which weren’t in the original article. The article is quite clear on how most of the items were constructed so it wasn’t too difficult to put this together for most of the variables.

The items that were most difficult were those dealing with family structure growing up. I haven’t been able to get my numbers to match Regnerus’s for all the different types of families. Largely this is because the item being measured is quite messy with multiple overlapping categories. If someone else can figure out how Regnerus got his numbers, specifically the step-parent and single parent categories, please send me the code and I’ll incorporate it in my publicly available do file.

An additional problem when analyzing the data is that an individual’s answers are often quite contradictory.  While sometimes this can create sociologically interesting questions, like under what conditions do people change their self-identified ethnic/racial classification, other times you just want a decent operationalization of the concept, like how many respondents lived with two moms?

Regnurus’s first article didn’t really address all the complexities involved with measuring family types. He relied on a single question for same-sex relations, whether a parent, “ever have a romantic relationship with someone of the same sex” (Question S7). 162 respondents said this applied to just their mom and 61 respondents said this applied to just their dad, and 11 said this applied to both parents.

One of the criticisms leveled at Regnerus was that this was a bad operationalization of same-sex parenting. Regnerus took this criticism seriously and, in a follow-up analysis, further split the group of people who reported having a mother involved in a same-sex relationship into those who had “spent time in residence with mother’s same-sex romantic partner (N = 85)” and those who “never lived with mother’s same-sex romantic partner (N = 90)” (page 1372).

This number is presumably based on the calendar that respondents were asked to fill out listing what years they lived with what adults when growing up (Q21). On that calendar, 85 people reported living at least four months with their “mother’s girlfriend/partner.” However—and this is where it gets tricky—a different question (S8) asked, “Did you ever live with your mother while she was in a romantic relationship with another woman?” Eight people who reported in the calendar that they lived with their mother’s girlfriend answered no to this question.

Looking at the annual calendar data for when a person reports living with their biological mother, 23 of the 85 respondents report at least one year when they lived with their mother’s girlfriend and not their biological mother. This could be because of the complexity of family life or it could be miss-clicks. For example, 4 people report living for a time with their biological dad and their mother’s girlfriend but not their mother.

It’s been widely noted that the Regnerus sample includes only two people who spent their whole lives with two moms. I think this estimate might be too high by half. Two respondents checked that they “always” lived with their “mother’s girlfriend/partner.”  However, one (case id #11118) of these two also reported he never lived with his mom while she was in a same-sex relationship (S8).  The other (#11825) reported that in addition to living with her mom and her mom’s girlfriend her whole life, she was living at least four months a year with her biological father her whole life and another unspecified person, complicating this case as well.

By my calculation, only 59 of the 85 people Regnerus counts as having, “spent time in residence with mother’s same-sex romantic partner” actually report ever living with their mom while she was dating a woman and report living with their mom and her girlfriend in the same year. In total, at most 1/3 (68 out of 236) of the people coded in the original article as having a “lesbian mother” or “gay father” report living with a parent and his/her partner while the parent were in a same-sex romantic relationship.

For some people, this messy data is likely more evidence of the instability associated with many adult relationships. Others might blame the fact that the data was gathered over the web or that a national survey is the wrong way to collect data when the category of interest is less than .5% of the population.

Another interpretation is that these retrospective questions are hard to answer and that social scientists should be quite wary in interpreting the results. For example, one of Regnerus’s interesting findings was that the children of father’s who had relationships with men are more likely to vote than others. In the weighted survey data, they are about 40% more likely to say yes to, “Did you vote in the last presidential election, in 2008?” (Q110). This effect, however, is largely driven by the fact that people were much more likely to say yes to this if they were polled during the 2012 Presidential primary season, and the NFSS didn’t interview people whose parents were married the whole time they were growing up during this last phase of the data collection.

Looking at just the subset of people who reported that their parents weren’t together their whole childhood, those polled in the final wave of the survey (based on case id number) are significantly more likely to report voting in 2008 than those who were polled earlier, with reported participation jumping from 52% to an implausible 79%. Since this late period of polling was conducted during a time in which only some states were holding elections, and those elections were relatively low-turnout Republican primaries and caucuses, it is unlikely that the effect is caused by people reporting that they voted in 2012.  Instead, it suggests to me that when everyone is talking about voting, people are likely to say they voted, even if they didn’t.

If current context so powerfully impacts reporting of relatively recent and uncontroversial events, it is hard to believe it has no effect on recall and reporting of events, attitudes and behaviors from childhood and adolescence.

7 Comments

  1. Posted December 10, 2012 at 7:52 pm | Permalink

    Verrrry interesting.

    Like

  2. Posted December 10, 2012 at 10:18 pm | Permalink

    Ignoring the politics etc of this particular case, the broader lesson, of course, is that data are always messy around the edges if you really look at them. The most meaningful findings are the ones that hold up under various ways of resolving the messiness and are not a function of one particular way to doing it. If the findings bounce around depending on the details of your measurement decisions or model specifications, I say the instability of results if your main finding,and that’s a finding.Or its a finding that the data are not good enough to resolve the matter.

    I say this because beginners who first learn about the realities of messy data think that means that they just should not do research at all, or come up with some story for themselves about how qualitative methods are somehow immune from comparable (if specifically different) problems.

    Like

    • Posted December 11, 2012 at 9:05 am | Permalink

      Joseph Young has a post up today at Political Violence @ a Glance on the messiness of cross-national indexes and some advice on building your own index. He echoes your sentiment that it can be a rookie mistake to abandon the project just because you happen to know about the uncertainties involved in the measurement.

      Like

      • Posted December 12, 2012 at 1:52 am | Permalink

        Very interesting post, and interesting link too.

        Like

    • Posted December 11, 2012 at 11:17 am | Permalink

      Using data that doesn’t quite ask the right questions, like GSS, is better than using low-qualuty untested data (including qualitative) designed to answer your question directly. When results are reduced to a tweet it’s hard to tell the difference between slipshod and tried-and-true, but the difference is real. Those billions spent on Census and Ad Health are good for something.

      That’s in general. In this case it’s safe to say research and data quality wasn’t the priority.

      Like

  3. Posted December 20, 2012 at 11:38 pm | Permalink

    Thank you for this work Dr. Caren, this does not surprise me a bit. I spent a significant amount of time pouring over the code book which showed that there was a lot of dirty data so I am not surprised with the results you are reporting.

    May I suggest Dr. caren that you contact Dr. Regnerus and simply ask him for the record ID numbers of who he classified as being raised in a step family, and who was raised in a single parent family in his original study and in his non peer reviewed rebuttal report.

    Between his first report and his second report, he says that in his second report he removed just under 400 records which he had previously classified in his first report in the “Other” category. But I did an analysis and found that he did not remove these approx 400 records from ONLY the “Other” category, he had removed records that were listed in his first report as step parent and also single parent categories. I am not even trained in the Social Sciences, and I myself found a lot of errors. Here is an Error Report
    http://www.scribd.com/doc/105616753/NFSS-Regnerus-Errors

    One statistic I am most interested in looking at is the one where Regnerus reports, I think it is 23% of the respondents who said either their father or their mother had a same sex romance also reported that they were sexually touched, or forced to have sex while growing up WITH/BY a parent or other adult caregiver. I am on an iPad so I don’t have the code book to go look up the question number. This statistic seems highly suspect to me, I would love someone to really look into that because that particular statistic has been literally shouted from the conservative anti gay websites, “gays are pedophiles!” Thank you in advance to anyone who will analyze that question.

    Like

  4. Posted December 20, 2012 at 11:42 pm | Permalink

    I should add the report of those approx 400 records that were removed. Here is the analysis that shows he removed records also from step parent and single parent categories http://www.scribd.com/doc/105614489/NFSS-Rebuttal-Report-Non-Disclosure-of-Records-Removed

    Like

3 Trackbacks

  1. [...] I’m passionate about open-source science, so I had to give Big Ups to Neal Caren who I just learned is sharing code on github.  His latest offering  essentially replicates the Mark Regnerus study of children whose parents had same-sex relationships.  The writeup of this exercise is at Scatterplot. [...]

    Like

  2. [...] It can be just as “fuzzy” as qualitative data. [...]

    Like

  3. By Teaching the Regnerus Controversy | milieuXmorass on December 11, 2012 at 9:45 pm

    [...] from fairly soon after the article was published. I’m assuming it’s not yet over as this very good blog from Neal Caren appeared just a few days ago at [...]

    Like

Follow

Get every new post delivered to your Inbox.

Join 629 other followers

%d bloggers like this: