The Massachusetts Board of Elementary and Secondary Education
Update on Automated Test Scoring
At the special meeting on January 14, 2019, I will update the Board of Elementary and Secondary Education (Board) on our progress in investigating the use of automated test scoring to help score English Language Arts essays on the next-generation MCAS tests. I first discussed the possibility of using this type of scoring with the Board in May 2018, and we have continued to conduct analyses and research. Automated essay scoring has the potential to report test results to students, their parents, and schools more quickly than at present.
Background
Since the beginning of the MCAS program, student responses to multiple-choice and other selected-response questions have been scored electronically, while open-ended questions have been scored by qualified, trained, and monitored human scorers in scoring centers. The scorers read hundreds of student responses and assign scores based on rubrics and student exemplars. This model is still in place for all of our MCAS tests.
As automated scoring engines have become increasingly reliable - meeting or even exceeding the reliability metrics used for human scorers - an increasing number of states and other testing programs have adopted automated scoring systems, usually called engines. These engines allow computers to assign scores to essays after being calibrated appropriately for each test item based on thousands of sample responses that have previously been scored by human scorers. The engines appear to offer advantages for large-scale assessment programs, including the ability to score quickly, to apply the same algorithm consistently, and to employ sophisticated routing techniques.
Analyzing the Use of Automated Scoring on Next-Generation MCAS Essays
At the May 2018 meeting of the Board, I presented results from our pilot study of one 2017 next-generation MCAS essay, for which we used the automated engine to rescore student responses after the operational testing and reporting were completed. The results from that pilot were promising, revealing very high rates of agreement and consistency between the automated engine and expert human scorers.
Following the May meeting, we expanded our study and used the automated engine to rescore one essay in each grade from the grades 3-8 tests in 2018, as well as one short-answer question in grade 4 (again, these analyses took place following the operational testing and reporting). The automated engine scored close to 400,000 student responses.
The results of the expanded study from 2018 continued to be very promising, revealing that the automated engine met or exceeded our criteria for consistency and accuracy on nearly all measures.
At the meeting on January 14, I will provide the following:
- a recap of the information from our May meeting, including our current scoring procedures and how the engine works;
- detailed results from our analysis of the automated scoring of ELA essay responses from 2018, including how the engine scored responses by subgroup and achievement level;
- information about some of the potential benefits, risks, and challenges of using automated scoring; and
- my recommendations for incorporating automated scoring into our scoring procedures going forward.
Deputy Commissioner Jeff Wulfson and Associate Commissioner Michol Stapel will join us for the discussion and answer any questions you may have.