what are we measuring when we measure behavior? elementary school edition

Organizations (and/or authority figures within organizations) are frequently called on to make consequential decisions about individuals. These decisions range from who to admit to a selective undergraduate institution or graduate program to which mortgage applications to accept to which prisoners should be paroled. The organizations and individuals have at their disposal varying kinds of information, which are perceived as being differently valuable in making those decisions. For example, in undergraduate admissions, we may know a student’s GPA, their SAT score, their class rank, the extracurriculars they participated in, and so on. Some schools may value SAT highly, others GPA, etc. In the past few decades, there has been a decided turn towards looking at behavioral information as particularly valuable across a variety of fields, including finance (the turn to behavioral credit scoring, which relies primarily on variables like past defaults and late payments) and criminal justice.

Bernard Harcourt has written extensively about the actuarial/predictive turn in criminal justice, especially in the context of parole decisions. In a separate paper, Risk as a Proxy for Race, Harcourt argues that the use of behavioral measures in making parole decisions tends to reinforce racial disparities. Through 1970, parole decisions were based on explicit demographic factors including nationality, race and religion. In the 1970s, parole systems refocused their efforts on a small number of behavioral variables, foremost among them “prior criminal history.” But prior criminal history is largely a reencoding of other racial disparities in the criminal justice system. Black potential parolees tend to have more prior offenses than white potential parolees, not necessarily because of any greater propensity for criminality, but because of bias at pretty much every point in the system – from differential policing to prosecutorial discretion to juries to sentencing laws.

What’s all this got to do with elementary schools? Today, I came across a paper on racial disparities in elementary school suspensions that reminded me of Harcourt’s discussion. This paper claim that past research showing racial disparities missed a key factor, as stated in the paper’s title: Prior problem behavior accounts for the racial gap in school suspensions. Wright et al. argue that “Differences in rates of suspension between racial groups … appear to be a function of differences in problem behaviors that emerge early in life, that remain relatively stable over time, and that materialize in the classroom.” This argument contrasts with earlier research which emphasized how “cultural bias harbored by teachers and school officials influences the subjective appraisals of the behavior of white and black students in a way that penalizes black youth.” Wright et al are claiming that the problem isn’t culturally-biased teachers, but badly-behaved black kids.

I admit that I’m out of my substantive element here, but I don’t buy the paper’s findings, for reasons inspired by Harcourt’s critique. Measures of past behavior are never simply measures of an individuals’ underlying personality or disposition to behave well or badly, nor even of their actual behavior (whatever that means). These measures are created in a particular context, and thus carry with them the biases of that context. Just as a criminal record carries captures all of the discrimination that led to that record, measures of past problem behavior presumably carry with them the same kinds of discriminatory biases that other researchers argue are present in elementary school suspension decisions. Yes, past problem behavior seems to statistically account for much of the racial gap in school suspensions*, but I’m not convinced it undermines claims of bias, so much as pointing to the continuity of bias.

Here the mechanism would not necessarily be the actual record itself (as in the case of parole screenings, where past behavior is literally a variable in a model that determines parole decisions), but rather similarities in the decision-making context of teachers reporting how badly behaved students are (the main independent variable is an average of teachers’ reports from Kindergarten, 1st, and 3rd grade) and the context of teachers deciding whether or not to suspend a student (in 8th grade). Wright et al address one potential concern with their interpretation – a labeling theory story, whereby teachers’ negative assessments early in childhood lead later teachers to assess those children more harshly later – but they don’t address what to me is a deeper problem: what if the kindergarten, first, and third grade teachers are themselves racially biased in their evaluations of problem behavior? Put another way, is it possible that white kids have to act out a lot more in order to be labeled a problem?** If that’s the case, then all the study is showing is a version of Harcourt’s finding: evaluations of past behavior encode systematic biases, and risk becomes a proxy for race.

And that’s what really interests me about this case. We have to be very careful in how we understand behavioral data in the context of organizational decision-making. Instead, we seem all too willing to take records of past behavior as evidence of someone’s essence, and to justify future decisions based on those recorded offenses.

* Though White et al make this claim based on the frustrating argument that a particular variable in a logistic regression goes from being significant to insignificant when controls are added: “The inclusion of a measure of prior problem behavior reduced to statistical insignificance the odds differentials in suspensions between black and white youth.” But as Gelman and Stern note, it’s possible for the difference between significant and insignificant to not itself be significant. And as far as I can see, the authors do not run an actual test to see if the two coefficients are themselves different – and notably, the coefficient on race with controls for past problem behavior is still positive (suggesting racial bias). Compare Table 2, models 1 and 2: The odds ratios on the race variable without controls is 1.80 (rse: .331) and with controls it’s 1.20 (rse: .285). I’m not good at eyeballing these things, and I’m not sure what the relevant test is for a model like this, but I’m not sure there’s evidence there for a significant difference. Any quant reader want to chime in here?

** Hint, yes.

Author: Dan Hirschman

I am a sociologist interested in the use of numbers in organizations, markets, and policy. For more info, see here.

14 thoughts on “what are we measuring when we measure behavior? elementary school edition”

  1. It is possible to consider the racial bias hypothesis by observing school discipline rates for students of other races. Here are data from page 2 of this source: http://www2.ed.gov/about/offices/list/ocr/docs/crdc-discipline-snapshot.pdf. I’ll present the racial and ethnic groups in order of discrepancy rates between enrollment rates and in-school suspension rates.

    Enrollment rates (In-school suspension rates)
    Asian: 5% (1%)
    Pacific Islander: 0.5% (0.2%)
    American Indian / Alaska Native: 0.5% (0.2%)
    White: 51% (40%)
    Hispanic: 24% (22%)
    Multiracial: 2% (3%)
    Black: 16% (32%)

    It seems difficult to attribute suspension rates only to racial bias among school representatives — and in particular, kindergarten teachers — after looking at data for Asians and Pacific Islanders, unless there is a reason why kindergarten teachers and other school representatives would be biased more in favor of Asians than whites.

    Note that, in the study that the post linked to, the estimated odds ratio for the black-white suspension gap after controlling for past behavior is 1.20. But the estimated odds ratio for the male-female suspension gap after controlling for past behavior is 2.14. So we should probably also discuss the possibility of sexism among kindergarten teachers.

    My larger point is not about school discipline but about the way that concerns about such topics are considered. If the item of interest is racial disparity in school discipline, then there might be value in considering more than two racial groups. If the proposed mechanism is bias among school representatives, then there might be value in considering other types of bias by these same school representatives.

    Like

    1. L.J.: Three quick responses. First, the authors note findings about Asian vs. White suspension rates and the limitations of their data which prevent studying more than Black-White differences. So I think they are in agreement with you on that.

      Second, I’m not sure what data on Asian-White gaps can tell us about racial bias against Black students (or specifically, the idea that Black students are perhaps punished more harshly for objectively similar behavior). In the US, Black-White race relations are simply different than other sorts of dyads, and I imagine we have reason to suspect that, say, stereotypes of Black criminality and dangerousness are much stronger than for any other group (see the portrayals of Trayvon Martin, Michael Brown, etc.).

      Third, I don’t think anyone is saying that race is the *only* factor. What White et al are trying to argue is that it isn’t a factor at all, and I don’t think they’ve made a compelling case.

      Liked by 1 person

      1. Hi Dan,

        If all the data we had were for black and white students, then ingroup bias is a possible explanation. Data on Asian-white gaps can provide information about the nature of the racial bias that these kindergarten teachers are suspected of having. Given the demographics of kindergarten teachers, the suspension gap in favor of Asians lets us rule out ingroup bias as a likely explanation.

        Based on the data for all races, a reasonable alternative is a stereotype explanation that you and Mikaila mentioned or alluded to.

        I guess I misinterpreted “Here the mechanism would not necessarily be the actual record itself” to refer to an explanatory mechanism that does not necessarily involve behavior.

        In any event, I’d agree that the point estimates need to be adjusted for racial bias among school officials.

        For what it’s worth, White et al. do appear to make nuanced claims in the article, such as “the use of suspensions may not be as racially biased as many have argued” (p. 8), but I’d agree that it’s incorrect for them to write: “the inclusion of the measure of prior problem behavior, more importantly, completely accounted for the black-white differentials in suspensions, reducing the odds ratio for race to 1.20 (Z = .80)” (p. 5); that passage is not even internally consistent, if a 20% increase in suspension rates is meaningful.

        Like

  2. It’s just confirmation bias. Educators assume that Asian Pacific American studies are motivated, diligent, studious, and school-oriented–more so than Whites (it’s called the Model Minority Myth). Thus, considering more than two racial groups lends further support to the notion that bias is at work here.

    Like

    1. Hi Mikaila,

      It is certainly possible that the Model Minority Myth contributes to disparities in school discipline patterns, but it’s also possible that disparities in school discipline patterns contribute to the Model Minority Myth. I don’t think that it’s possible to use one to explain the other without theory or data that help to untangle correlation and causation.

      Like

  3. On the statistical front, I have often watched a coefficient leap into “significance” as I got too many partially-collinear variables into the model. To unpack this, you’d also want an analysis that treated teachers’ judgments of behavioral problems as dependent. There is a literature that shows higher incidence of behavioral problems in early childhood for children whose parents have been incarcerated. So one approach asks whether children are being blamed in school for the ills of bad social policies. Another harder to detect issue is in-school dynamics.

    My own son (now a PhD mathematician who is extremely well-controlled) was a hyper-sensitive and very difficult little boy. In first grade, in the fall term he was well-behaved in the classroom (per his teacher) but was complaining that other boys were teasing him. The teacher told him it was his problem. By the end of the school year, he was one of the “bad boys” in the first grade, the teacher hated him, he spent half his time in the principal’s office. Some of the parent volunteers thought he was being mistreated by his teacher. Resolution: a shift to a new school, a parent who also has a hot temper (yours truly) who worked with him a lot about appropriate and inappropriate ways of expressing emotion. White face, high SES, back on track in a few years. Black face, parents with fewer resources, kid gets labeled as a behavioral problem and tracked into the “bad kids” track permanently. And, yes, children in school yards DO tease and harass racial minority children, as well as children who are “weird” on a wide variety of child pecking order systems.

    Like

  4. It sounds like another way to spin those results might be that they have identified a key mechanism for suspensions (problem behavior), and that mechanism fully mediates any race effects on suspensions.

    Without looking deeper into differential rates of problem behavior (or, differential punishments for the same behavior, labeling one act a problem in some students and the same act not a problem with other students), I’m not sure it is appropriate to control for prior problem behavior. They probably don’t have any way to know (from their data) whether or not there is differential labeling of behavior (which seems likely – we know this to be the case with adults), and so PPB might mean something completely different in their data. The underlying construct of what they have measured (PPB) almost certainly varies meaningfully by race.

    It would be like controlling for traffic stops and then arguing that race isn’t a significant predictor of police attention.

    Liked by 2 people

  5. Rachel Fish has been using an experimental vignette study to get at how 3-4th grade teachers decide to refer children for special needs, which includes identifying the described vignette children as behavioral problems. Changing the name of the child in the story to one identified as “Black” leads to more referrals for behavior control needs and fewer for educational needs than when the same child is white — but only for boys. There is also other research on gender perceptions that shows both Black boys/girls and men/women are “overmasculinized” (attributed more male-typed characteristics and at higher levels) which is an advantage for Black girls and a risk for Black boys; conversely Asian boys/men and girls/women are “overfemininized” compared to whites — which would suggest that docility rather than threat is perceived in Asians. Since minorities are socially subodinated to white, feminization is “appropriate” (“model minority” is woman-like in several dimensions).

    Liked by 2 people

  6. I think there might be an issue with the way the author’s interpreted their findings. I’m not sure it shows what they think it does.

    They have a measure for parental reported bad-behavior and note that, “The delinquency measure includes items that could result in a suspension if the behavior occurred at school.”

    In their whites only model, a one unit increase in the delinquency measure increased the odds-ratio of being suspended by 5.77. In the blacks only model, the coefficient was 14.5. Because logit models are wonky, you can’t directly compare the two, but my reading of this is that black kids who do “bad” stuff are punished at much higher rates than white kids who do the same stuff. This would be evidence for racial bias, not against it.

    Liked by 1 person

    1. If we are going to be skeptical about the accuracy of suspension rates, then let’s also be skeptical about the accuracy of parental reports of whether their child cheats, steals, and fights.

      The odds ratios for prior problem behavior are nearly identical by race (1.32 for whites, and 1.27 for blacks), which suggests that the prior problem behavior measure is not strongly correlated with the parent-reported delinquency measure (respective odds ratios of 5.77 and 14.5). There’s a literature on race differences in under-reporting of drug use (e.g., Mensch and Kandel 1988 POQ, Fendrich and Vaughn 1994 POQ, Fendrich and Rosenbaum 2003 DaAD, Fendrich and Johnson 2005 JUH). Maybe that literature is relevant to the school suspension study, and maybe not.

      It is certainly possible that teacher reports are biased by stereotypes, but it’s also possible that parent reports are biased by parental concern, mistrust of researchers, different baselines, and the fact that parental reports are being used in a model of school behavior that occurs when the child is away from the parent.

      Like

  7. Murray’s tweet simply states that researchers need to deal with these findings. Representing that as “people complaining about race inequality ignore the inherent badness of Blacks” is sort of like representing your post as “only research that supports the assertion of fundamental sameness of all racial groups should be published.”

    Like

  8. There is a test for the hypothesis of no change in a regression coefficient when controls are added (Clogg, Petkova, and Haritou, AJS 1995). I don’t think it’s very useful, because I can’t see why you would give any credence to that null hypothesis. However, it does seem to be popular in some fields.

    The main problem is the substantive one you mention–it’s possible that the reports of problem behavior in early grades include an element of racial bias. But it’s also worth noting that their model including prior problem behavior is compatible with large race effects on the probability of suspension: by my calculations, the 95% confidence interval for the odds ratio goes from about .76 to 1.9. So it illustrates the textbook lesson that “did not reject the null hypothesis” does not mean “found that the null hypothesis is true” or even approximately true.

    Liked by 1 person

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s