An expert witness for the plaintiffs in Vergara v California, Tom Kane, argued that effective teaching, and presumably ineffective teaching, in the Los Angeles schools can be measured. Vergara, of course, seeks to strike down teacher due process and it is about as overtly anti-teacher of an action as can be imagined.
In a previous post, I described ways that Kane’s work in the Gates Foundation’s Measures for Effective Teaching (MET) final report, calls into question that assertion. Kane seemed to also base his opinion on “The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood” by Raj Chetty, John Freidman, and Jonah Rockoff.” A careful reading of their methodology, however, also calls Kane’s claim into question. So, this post will ask whether the evidence cited by Kane and Chetty et. al actually argues for the plaintiff’s case, or against it.
Kane testified that he applied the Chetty methodology to Los Angeles students and teachers. He made a big point about the subsequent finding that learning loss for students in classes taught by lowest performing Los Angeles teachers was twice as large as the loss to students in the MET sample. But, its sample is only 56% low-income, 8% special education students, and 13% English Language Learners. Kane did not mention that the low-income rate in L.A. is 50% higher than the MET sample, and the percentage of L.A.’s ELL students is nearly 300% greater.
A similar, though smaller, problem can be found in the sample used by Chetty et. al. They excluded the 6% of observations in classrooms where more than 25% of students were on special education IEPs.
This leads to the most likely explanation for why the effects of teachers found to be “ineffective” differ. Their estimates are likely to be greatly influenced by the number of harder-to-educate students who are excluded. Vergara, however, seeks to fire real-world teachers. So, the case’s evidence should be based on evidence about outcomes for all students, not just those who make it easier for researchers to do their jobs.
A similar question is raised as to why the Chetty methodology would be used for a study of Los Angeles teachers, at least in the context of a lawsuit seeking the termination of individuals. Chetty et. al sought to measure the mean effects of high and low value-added teachers. I have no doubt that they reduced bias on average teacher effects to acceptable levels for their basic research purposes, but that does not mean they reduced errors enough for policy discussions.
As the moderate scholar Matt DiCarlo notes, the overall levels of bias in Chetty et. al may be acceptable but they “do not preclude the fact that many individual teachers’ estimates will be biased.” (Emphasis is DiCarlo’s) That is doubly true, I would add, for a case that would override the duly enacted laws of the state that were passed with the expressed intent of protecting teachers.
That brings us to the biggest difference between the qualifications in the written claims of these economists and their confident or extremely overconfident testimony. As DiCarlo also notes in regard to “Long Term Impacts of Teachers,” Chetty admits that it is not adequate for making judgments on the high-stakes implementation of using value-added (VA) in the real world because “the cost of errors from using VA outweighs the benefits” are “important issues” that “must be resolved before one can determine whether VA should be used to evaluate teachers.” (Emphasis is DiCarlo’s)
Vergara would strike down teachers’ protections and, almost certainly, the plaintiffs would seek the imposition of value-added evaluations. But, Chetty did not testify about a reality he acknowledged would be a danger of such a policy, “using VA measures in high-stakes evaluations could induce responses such as teaching to the test or cheating.”
Finally, the MET report, co-authored by Kane, called into question whether teaching “effectiveness” can be measured without other sophisticated metrics such as student surveys, duplicate tests, and random assignment of students to address the role of sorting. Chetty et. al do not draw upon those measures. And, on the witness stand, Chetty (in contrast to other value-added scholars on all sides of the debate) seemed to be remarkably dismissive of the problems created by sorting. Chetty argued that his “quasi-experimental” research design was comparable to randomly assigning students to a study.
Testifying in Vergara, Chetty displayed four charts, based on his quasi-experiment, documenting the effects on schools’ value-added after a high or low valid-added teacher transferred into those schools. I have no reason to doubt his findings, but I question whether this data should be applied to the evidence and logic underlying Vergara. Chetty was supposed to be an important witness because his judgments were informed by a huge study over twenty years of data based on the test results of 2.5 million students and tens of thousands of teachers.
His testimony was based on a part of “Longterm Impacts of Teachers,” however, which only shows the benefits of replacing 1601 “highly ineffective” teachers. He assumes, of course, that better replacements for those teachers are available and that can be a very dubious assumption, even for such a small number. How many qualified replacements would be available for replacing the entire 5% of the lowest rated teachers?
I am not qualified to assess the benefits of such dismissal which Chetty says would raise student performance in the grades of those few schools from .18 standard deviations to .24 std. But, how is Chetty qualified to speculate of the effect of those policies on the tens of thousands of teachers in the rest of the system? It makes no sense to trade such small gains for such a small number of classrooms if the evaluation system backfires and drives down the effectiveness of many or most of the other 95%. Vergara, however, offers no evidence that the evaluation systems favored by the plaintiffs would not cause severe damage.
If value-added is not as valid for evaluation of individuals as Chetty believes, the most likely result is that teachers will flee schools where it is harder to raise test scores. And, Chetty et. al only claim that their estimates are 80% reliable in terms of the policy question at hand –firing individual teachers. They ignore the observation of education expert Bruce Baker who notes that “a 1/5 misfire rate to generate a small marginal benefit, might still have a chilling effect on future teacher supply.”
I began this analysis with the goal comparing and contrasting Kane’s and Chetty’s actual research findings with the testimony and the logic of Vergara. I began by contrasting the seemingly simple and straight-forward statement of Kane that effective teaching and, presumably, ineffective teaching can be identified. Kane’s testimony indicated that his MET study, and his and Chetty’s analysis of Los Angeles teachers value-added, were the basis for his confident claim. I have shown, however, that the actual findings of their studies do not confirm his claim.
In the course of studying this question, I became intrigued by Chetty’s testimony. In a court of law, I would have expected him to be more, not less careful with his words. On the witness stand, he displayed a certainty that seemed far beyond anything he would dare express in a written paper. Had Chetty applied the rigor, the care, and skillful analysis that goes into his own scholarship into studying the issues in Vergara, he would have been neutral or he would have sided with the defense. So, why did Chetty allow his sophisticated research to be misused in a simplistic attack on teachers?
For that reason, I will continue this analysis in a third post. It will address three statements on the witness stand that raise questions about what Chetty (and others?) do not know about the logistics of real schools.