Should student essays be graded by computers?

The issue of whether and how much computers should be responsible for student grading has been around for decades, but the recent advent of MOOCs and the upcoming implementation of Common Core tests has brought the issue to the forefront yet again. One MOOC, edX, has recently developed software, which they will make available as freeware, that will grade student essays. edX argues that their new software is not only adequate, but that it will increase student learning because it will provide instant feedback on student essays, which will then encourage those students to rework their essays for re-submission.

Others, such as MIT researcher Les Perelman, are highly critical of computerized grading. Perelman has successfully written and tested several nonsensical essays which have been graded highly by some of the testing software. Those who agree with Perelman have recently founded a group, known as Professionals Against Machine Scoring Of Student Essays In High-Stakes Assessment, to protest computerized scoring of essays.

Most germane to K-12 education, however, is the news that Smarter Balanced and PARCC are both experimenting with computerized grading of essays in their tests leading up to Common Core implementation next school year:

Joe Willhoft, the executive director of SBAC, told Catherine Gewertz of Education Week in an email that written responses from students participating in the ongoing pilot tests will be hand-scored by the consortium’s contractor, with guidance from SBAC staff. The contractor will then use the scored responses to try to “train” artificial-intelligence software to score the papers.

Scoring, both human and artificial, will focus on three aspects of students’ writing, Willhoft explained: 1) overall organization and style (things like how well it’s written, whether the sentences are complete and coherent, and the voice and style appropriate) 2) conventions of the language, and 3) students’ use of evidence (whether the essay refers appropriately to the reading materials on which it is based). Based on what is known about computer scoring, he said, Smarter Balanced officials are more confident that it will succeed with conventions, organization, and style than with use of evidence.

They’ll divide the papers into two chunks: a training set and a validity set. Programmers will use the training set to teach the computerized scoring engine to replicate the human scores. They’ll use the validity set to see if the software actually replicates the human scores. With that feedback in hand, SBAC will get its arms around the reliability of computer scoring.

Educators, students, and parents may be willing to accept computerized grading for Common Core only if it is tested rigorously and proven to be legitimate, but MOOCs seem ready to move ahead with computerized grading right away.

For more information, please visit these two websites: http://www.nytimes.com/2013/04/05/science/new-test-for-computers-grading-essays-at-college-level.html?pagewanted=1&_r=0

http://blogs.edweek.org/edweek/curriculum/2013/04/should_common_tests_use_computers_to_score_writing.html