TY - JOUR
T1 - Opportunities of natural language processing for comparative judgment assessment of essays
AU - De Vrindt, Michiel
AU - Tack, Anaïs
AU - Van den Noortgate, Wim
AU - Lesterhuis, Marije
AU - Bouwer, Renske
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/6
Y1 - 2025/6
N2 - Comparative judgment (CJ) is an assessment method commonly used for assessing essay quality, where assessors compare pairs of essays and judge which essays are superior in quality. A psychometric model is used to convert judgments into quality scores. Although CJ yields reliable and valid scores, its widespread implementation in educational practice is hindered by its inefficiency and limited feedback capabilities. This conceptual study explores how Natural Language Processing (NLP) can address these limitations, drawing upon existing NLP techniques and the very limited research on their integration within CJ. More specifically, we argue that, at the start of the assessment, initial essay quality scores could be predicted from essay texts using NLP, mitigating the cold-start problem of CJ. During the CJ assessment, selection rules could be constructed using NLP to efficiently increase the reliability of the scores while supporting assessors by not letting them make too difficult comparisons. After the CJ assessment, NLP could automate feedback, helping to better understand how assessors arrived at their judgments and explaining the scores to assessees (students). To support future research, we overview appropriate methods based on existing research and highlight important considerations for each opportunity. Ultimately, we contend that integrating NLP into CJ can significantly improve the efficiency and transparency of the assessment method, all while preserving the crucial role of human assessors in evaluating writing quality.
AB - Comparative judgment (CJ) is an assessment method commonly used for assessing essay quality, where assessors compare pairs of essays and judge which essays are superior in quality. A psychometric model is used to convert judgments into quality scores. Although CJ yields reliable and valid scores, its widespread implementation in educational practice is hindered by its inefficiency and limited feedback capabilities. This conceptual study explores how Natural Language Processing (NLP) can address these limitations, drawing upon existing NLP techniques and the very limited research on their integration within CJ. More specifically, we argue that, at the start of the assessment, initial essay quality scores could be predicted from essay texts using NLP, mitigating the cold-start problem of CJ. During the CJ assessment, selection rules could be constructed using NLP to efficiently increase the reliability of the scores while supporting assessors by not letting them make too difficult comparisons. After the CJ assessment, NLP could automate feedback, helping to better understand how assessors arrived at their judgments and explaining the scores to assessees (students). To support future research, we overview appropriate methods based on existing research and highlight important considerations for each opportunity. Ultimately, we contend that integrating NLP into CJ can significantly improve the efficiency and transparency of the assessment method, all while preserving the crucial role of human assessors in evaluating writing quality.
KW - Automated essay scoring
KW - Comparative judgment
KW - Hybrid human-AI
KW - Natural language processing
KW - Partial-automation
UR - http://www.scopus.com/inward/record.url?scp=105004877698&partnerID=8YFLogxK
U2 - 10.1016/j.caeai.2025.100414
DO - 10.1016/j.caeai.2025.100414
M3 - Article
AN - SCOPUS:105004877698
SN - 2666-920X
VL - 8
JO - Computers and Education: Artificial Intelligence
JF - Computers and Education: Artificial Intelligence
M1 - 100414
ER -