Five Recommendations For Reporting On (Or Just Interpreting) State Test Scores

Matthew DiCarlo of the Albert Shanker Institute has developed five recommendations for reporters who write stories on state test scores. But these recommendations are not only useful for reporters – educators also would benefit from understanding the nuances of these numbers that are increasingly used to judge worth.

1. Look at both scale scores and proficiency rates: Some states (including D.C.) don’t release scale scores, but most of them do. Scores and rates often move in opposite directions. If you don’t look at both, you risk misleading your readers (frankly, it’s best to rely on the scores for presenting trends). In addition, check whether any changes (in scores or rates) are disproportionately concentrated in a small number of grades or student subgroups.

2. Changes in proficiency rates, especially small changes, should not be taken at face value: In general, rate changes can tell you whether a larger proportion of tested students scored above the (often somewhat-arbitrarily-defined) proficiency cutoff in one year compared with another, but that’s a very different statement from saying that the average student improved. Because most of the data are cross-sectional, states’ annual test score results entail a lot of sampling error (differences in the students being compared), not to mention all the other issues with proficiency rates (e.g., changes in rates depend a great deal on the clustering of students around the cutoff point) and the tests themselves. As a result, it’s often difficult to know, based on rate changes, whether there was “real” improvement. If you must report on the rates, exercise caution. For instance, it’s best to regard very small changes between years (say, 1-2 percentage points, depending on sample size) as essentially flat (i.e., insufficient for conclusions about improvement). Also keep in mind that rates tend to fluctuate – up one year, down the next. Finally, once again, the scores themselves, rather than the rates, are much better for presenting trends.

3. Changes in rates or scores are not necessarily due to school improvements (and they certainly cannot be used as evidence for or against any policy or individual): Albeit imperfectly, test scores by themselves measure student performance, not school performance. Changes might be caused by any number of factors, many of which have nothing to do with schools (e.g., error, parental involvement, economic circumstances, etc.). If a given change in rates/scores is substantial in magnitude and shared across grades and student subgroups, it is plausible that some (but not all) of it was due to an increase in school effectiveness. It is almost never valid, however, to attribute a change to particular policies or individuals. Districts and elected officials will inevitably try to make these causal claims. They should be ignored, or the claims should at least be identified as pure speculation.

4. Comparing average scores/rates between schools or districts is not comparing their “performance”: Unlike the other “guidelines” above, this one is about the scores/rates themselves, rather than changes in them. If one school or district has a higher average score or proficiency rate than another, this doesn’t mean it is higher-performing, nor does a lower score/rate signal lower effectiveness. The variation in average scores or rates is largely a function of student characteristics, and schools/districts vary widely in the students they serve. These comparisons – for example, comparing a district’s results to the state average – can be useful, but be careful to frame them in terms of student and not school performance.

5. If you want get an approximate idea of schools’ relative performance, wait until states release their value-added or growth model results: Many states employ value-added or other growth models that, interpreted cautiously, provide defensible approximations of actual school effectiveness, in that they make some attempt to control for extraneous variables, such as student characteristics. When possible, it’s much better to wait for these results, which, unlike raw state testing data, are at least designed to measure school performance (relative to comparable schools in the state or district).