Evaluation of semi-automated record screening methods for systematic reviews of prognosis studies and intervention studies

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Systematic reviews (SRs) synthesize evidence through a rigorous, labor-intensive, and costly process. To accelerate the title-abstract screening phase of SRs, several artificial intelligence (AI)-based semi-automated screening tools have been developed to reduce workload by prioritizing relevant records. However, their performance is primarily evaluated for SRs of intervention studies, which generally have well-structured abstracts. Here, we evaluate whether screening tool performance is equally effective for SRs of prognosis studies that have larger heterogeneity between abstracts. We conducted retrospective simulations on prognosis and intervention reviews using a screening tool (ASReview). We also evaluated the effects of review scope (i.e., breadth of the research question), number of (relevant) records, and modeling methods within the tool. Performance was assessed in terms of recall (i.e., sensitivity), precision at 95% recall (i.e., positive predictive value at 95% recall), and workload reduction (work saved over sampling at 95% recall [WSS@95%]). The WSS@95% was slightly worse for prognosis reviews (range: 0.324-0.597) than for intervention reviews (range: 0.613-0.895). The precision was higher for prognosis (range: 0.115-0.400) compared to intervention reviews (range: 0.024-0.057). These differences were primarily due to the larger number of relevant records in the prognosis reviews. The modeling methods and the scope of the prognosis review did not significantly impact tool performance. We conclude that the larger abstract heterogeneity of prognosis studies does not substantially affect the effectiveness of screening tools for SRs of prognosis. Further evaluation studies including a standardized evaluation framework are needed to enable prospective decisions on the reliable use of screening tools.

Original languageEnglish
Pages (from-to)975-989
Number of pages15
JournalResearch Synthesis Methods
Volume16
Issue number6
Early online date22 Jul 2025
DOIs
Publication statusPublished - Nov 2025

Keywords

  • active learning
  • clinical guideline development
  • large language models
  • prioritized screening
  • semi-automation
  • systematic reviews

Fingerprint

Dive into the research topics of 'Evaluation of semi-automated record screening methods for systematic reviews of prognosis studies and intervention studies'. Together they form a unique fingerprint.

Cite this