Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models

Research output: Contribution to journalArticleAcademicpeer-review

5 Downloads (Pure)

Abstract

BACKGROUND AND OBJECTIVES: We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.

METHODS: We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.

RESULTS: We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]).

CONCLUSION: Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models.

SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.

Original languageEnglish
Pages (from-to)8-22
Number of pages15
JournalJournal of Clinical Epidemiology
Volume154
Early online date24 Nov 2022
DOIs
Publication statusPublished - Feb 2023

Keywords

  • Development
  • Diagnosis
  • Predictive algorithm
  • Prognosis
  • Risk prediction
  • Validation

Fingerprint

Dive into the research topics of 'Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models'. Together they form a unique fingerprint.

Cite this