“Un-MET’ Goals”: Questions about Gates Foundation’s MET Study

nepcIn response to the release of the Measures of Effective Teaching (MET) project final results , which this blog posted about here, the National Education Policy Center (NEPC) has released a review of the long-awaited study on teacher evaluation that strongly questions the spin that has been put on the findings.

The MET project, funded by the Bill and Melinda Gates Foundation, released its final set of reports this month. Those reports are supposed to advise schools and districts about how to design teacher evaluations.

However, a review by the NEPC of the MET research – an ambitious, multi-year study of thousands of teachers in six school districts – finds that the study’s results were inconclusive and provide little usable guidance.

“The MET research does little to settle longstanding debates over how best to evaluate teachers comprehensively,” said Jesse Rothstein of the University of California Berkeley. Rothstein and William Mathis, NEPC’s managing director, conducted the review for the policy center’s Think Twice think tank review project.

The MET study compared three types of teacher performance measures: student test scores, classroom observations, and student surveys. The project concluded that the three should be given roughly equal weight in teacher evaluations.

Rothstein and Mathis found that the data do not support that conclusion. Instead, the data indicate that each measure reflects a distinct dimension of teaching. Rothstein said, “Any evaluation system needs to be founded on a judgment about what constitutes effective teaching, and that that judgment will drive the choice of measures. Nothing in the MET project’s results helps in forming that judgment.”

“While we commend The Bill and Melinda Gates Foundation for investing millions of dollars in tackling critical education issues, the conclusions in this case do not jibe with the data,” said Mathis.

Teacher evaluation has emerged as a prominent educational policy issue; it was, for instance, one of several contested points during the Chicago teachers’ strike in September 2012. And it is a key element of the Obama administration’s education policy. Debate over teacher compensation, hiring and firing, which once centered on traditional salary matrices and teacher observation systems, is increasingly focused on finding concrete outcome measures – particularly, student test score gains. But these measures are controversial, as critics claim that they miss important dimensions of teacher effectiveness.

Following are some of the issues NEPC’s reviewers found with the MET study:

Samples Were Not Representative of the Teaching Force
The centerpiece of the MET study was an experiment that randomly assigned students to teachers. This experimental approach was meant to determine once and for all whether value-added (VA) scores are biased by student assignments. That is, do teachers who are assigned more successful students benefit in terms of their VA scores? But the group of teachers who participated in the MET experiment turned out not to be representative of teachers as a whole, and many participating schools failed to comply with their experimental assignments. As a result, the experiment did little to resolve the question.

No Single “Quality” Factor
Each type of measure explored in the MET study (student test scores, classroom observations, and student surveys) captures an independent dimension of teaching practice. But each measure provides only minimal information about the others. These results indicate that there is no single general teaching “quality” factor– or that if there is any such factor it accounts for only a small share of the variation in each of the measures. Rather, there are a number of distinct factors, and policymakers must choose how to weight them in designing evaluations.

MET Results do not Impact Student Performance on Conceptually Demanding Assessments
None of the three types of performance measures captures much of the variation in teachers’ impacts on alternative, conceptually demanding tests. There is little reason to believe that an evaluation system based on any of the measures considered in the MET project will do a good job of identifying teachers who are effective (or ineffective) at raising students’ performance on these more conceptually demanding assessments.

