When I began studying ethnic matching in the early 2000s, the empirical basis of the field was much thinner than it is now. Most of the work available at that point was correlational. Observers had noticed, repeatedly, that students of color whose teachers shared their racial or ethnic background tended to perform better on various outcomes, but the studies available could not separate that association from the long list of other variables moving alongside it. Critics were right to point out that a student in a matched classroom was also more likely to be in a particular school, a particular neighborhood, or a particular track, and that any of those other factors might be doing the work the research was crediting to matching. For a while, the critique stood, and the field struggled to move past it.
What changed, starting in the late 2000s and accelerating through the following decade, was a shift in methodology. Thomas Dee’s early work on the Tennessee STAR data was among the first pieces to use a randomized experiment, a study originally designed to measure class-size effects, as a natural setting for estimating matching effects. Dee’s finding, that same-race student-teacher pairings produced small but real improvements in reading and math scores for Black students, was difficult to dismiss on methodological grounds. Randomization took most of the selection argument off the table.
Over the decade that followed, the field developed more rigorous designs. Seth Gershenson, Cassandra Hart, Joshua Hyman, Constance Lindsay, and Nicholas Papageorge developed an extended research program around North Carolina administrative data, eventually publishing work in the American Economic Journal: Economic Policy showing that Black students who had at least one Black teacher in grades three through five were substantially more likely to graduate high school and enroll in college. For the lowest-income Black boys in the sample, the effect of a single Black teacher in elementary school reduced the probability of dropping out of high school by over thirty percent. Anna Egalite, Brian Kisida, and Marcus Winters produced a broad Florida review that found directionally consistent effects across a range of subgroups. Studies extended the framework to Latino students and Latino teachers with similar patterns. And the literature on teacher expectations, developed in part by the same research team, established that non-Black teachers hold systematically lower expectations of Black students than Black teachers do of the same students, controlling for measured performance, which gave the matching literature a plausible mechanism to point to.
My own contributions have been threefold, and I will name them because they are part of what the field has produced. An early study I published with Alonzo Davis in 2009 drew on administrative data to estimate matching effects across a broader range of outcomes than previous work had addressed. That paper has accumulated over four hundred citations since, which I mention not as a boast but as an indicator of the role it has played as one of the steppingstones into the modern literature. In 2019, I published a book-length treatment of the matching research that pulled the then-existing evidence into a single volume and positioned it inside a broader theoretical argument about the cultural context of educational knowledge. Most recently, the commentary with Jemimah Young and Devin Williams, “Still Searching for a Match,” argues that the field’s next move has to be toward a justice-centered framing, which treats matching not only as a quantitative question but as an ethical one tied to the history of who has been allowed to teach, and where.
Where the field sits now is a different place than where it sat when I entered it. The core finding, that matching produces meaningful improvements in a range of outcomes for students of color, is no longer seriously contested by researchers engaging the empirical literature honestly. What remains contested, and appropriately so, is the mechanism, which I will take up in a separate essay because it deserves its own treatment. Other open questions include how the effects interact with pedagogy, whether matching is the right frame for populations the original research did not anticipate, and what a policy response that takes the evidence seriously would actually look like in practice.
The field has matured. It has not closed. That is a healthy place for a body of research to be, because the questions that remain are sharper than the ones the field began with, and the evidence base is now strong enough to support the next set of investigations without having to relitigate the earlier ones.
DEB
