Organizations (and/or authority figures within organizations) are frequently called on to make consequential decisions about individuals. These decisions range from who to admit to a selective undergraduate institution or graduate program to which mortgage applications to accept to which prisoners should be paroled. The organizations and individuals have at their disposal varying kinds of information, which are perceived as being differently valuable in making those decisions. For example, in undergraduate admissions, we may know a student’s GPA, their SAT score, their class rank, the extracurriculars they participated in, and so on. Some schools may value SAT highly, others GPA, etc. In the past few decades, there has been a decided turn towards looking at behavioral information as particularly valuable across a variety of fields, including finance (the turn to behavioral credit scoring, which relies primarily on variables like past defaults and late payments) and criminal justice.
Bernard Harcourt has written extensively about the actuarial/predictive turn in criminal justice, especially in the context of parole decisions. In a separate paper, Risk as a Proxy for Race, Harcourt argues that the use of behavioral measures in making parole decisions tends to reinforce racial disparities. Through 1970, parole decisions were based on explicit demographic factors including nationality, race and religion. In the 1970s, parole systems refocused their efforts on a small number of behavioral variables, foremost among them “prior criminal history.” But prior criminal history is largely a reencoding of other racial disparities in the criminal justice system. Black potential parolees tend to have more prior offenses than white potential parolees, not necessarily because of any greater propensity for criminality, but because of bias at pretty much every point in the system – from differential policing to prosecutorial discretion to juries to sentencing laws.
What’s all this got to do with elementary schools? Today, I came across a paper on racial disparities in elementary school suspensions that reminded me of Harcourt’s discussion. This paper claim that past research showing racial disparities missed a key factor, as stated in the paper’s title: Prior problem behavior accounts for the racial gap in school suspensions. Wright et al. argue that “Differences in rates of suspension between racial groups … appear to be a function of differences in problem behaviors that emerge early in life, that remain relatively stable over time, and that materialize in the classroom.” This argument contrasts with earlier research which emphasized how “cultural bias harbored by teachers and school officials influences the subjective appraisals of the behavior of white and black students in a way that penalizes black youth.” Wright et al are claiming that the problem isn’t culturally-biased teachers, but badly-behaved black kids.
I admit that I’m out of my substantive element here, but I don’t buy the paper’s findings, for reasons inspired by Harcourt’s critique. Measures of past behavior are never simply measures of an individuals’ underlying personality or disposition to behave well or badly, nor even of their actual behavior (whatever that means). These measures are created in a particular context, and thus carry with them the biases of that context. Just as a criminal record carries captures all of the discrimination that led to that record, measures of past problem behavior presumably carry with them the same kinds of discriminatory biases that other researchers argue are present in elementary school suspension decisions. Yes, past problem behavior seems to statistically account for much of the racial gap in school suspensions*, but I’m not convinced it undermines claims of bias, so much as pointing to the continuity of bias.
Here the mechanism would not necessarily be the actual record itself (as in the case of parole screenings, where past behavior is literally a variable in a model that determines parole decisions), but rather similarities in the decision-making context of teachers reporting how badly behaved students are (the main independent variable is an average of teachers’ reports from Kindergarten, 1st, and 3rd grade) and the context of teachers deciding whether or not to suspend a student (in 8th grade). Wright et al address one potential concern with their interpretation – a labeling theory story, whereby teachers’ negative assessments early in childhood lead later teachers to assess those children more harshly later – but they don’t address what to me is a deeper problem: what if the kindergarten, first, and third grade teachers are themselves racially biased in their evaluations of problem behavior? Put another way, is it possible that white kids have to act out a lot more in order to be labeled a problem?** If that’s the case, then all the study is showing is a version of Harcourt’s finding: evaluations of past behavior encode systematic biases, and risk becomes a proxy for race.
And that’s what really interests me about this case. We have to be very careful in how we understand behavioral data in the context of organizational decision-making. Instead, we seem all too willing to take records of past behavior as evidence of someone’s essence, and to justify future decisions based on those recorded offenses.
* Though White et al make this claim based on the frustrating argument that a particular variable in a logistic regression goes from being significant to insignificant when controls are added: “The inclusion of a measure of prior problem behavior reduced to statistical insignificance the odds differentials in suspensions between black and white youth.” But as Gelman and Stern note, it’s possible for the difference between significant and insignificant to not itself be significant. And as far as I can see, the authors do not run an actual test to see if the two coefficients are themselves different – and notably, the coefficient on race with controls for past problem behavior is still positive (suggesting racial bias). Compare Table 2, models 1 and 2: The odds ratios on the race variable without controls is 1.80 (rse: .331) and with controls it’s 1.20 (rse: .285). I’m not good at eyeballing these things, and I’m not sure what the relevant test is for a model like this, but I’m not sure there’s evidence there for a significant difference. Any quant reader want to chime in here?
** Hint, yes.