TY - JOUR
T1 - Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
AU - Andaur Navarro, Constanza L
AU - Damen, Johanna Aa
AU - van Smeden, Maarten
AU - Takada, Toshihiko
AU - Nijman, Steven Wj
AU - Dhiman, Paula
AU - Ma, Jie
AU - Collins, Gary S
AU - Bajpai, Ram
AU - Riley, Richard D
AU - Moons, Karel Gm
AU - Hooft, Lotty
N1 - Funding Information:
Funding: GSC is funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) and by Cancer Research UK program grant (C49297/A27294). PD is funded by the NIHR Oxford BRC. RB is affiliated to the National Institute for Health and Care Research (NIHR) Applied Research Collaboration (ARC) West Midlands. The views expressed are those of the authors and not necessarily those of the NHS, NIHR, or Department of Health and Social Care. None of the funding sources had a role in the design, conduct, analyses, or reporting of the study or in the decision to submit the manuscript for publication. Declaration of interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Funding Information:
Funding: GSC is funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) and by Cancer Research UK program grant ( C49297/A27294 ). PD is funded by the NIHR Oxford BRC . RB is affiliated to the National Institute for Health and Care Research (NIHR) Applied Research Collaboration (ARC) West Midlands. The views expressed are those of the authors and not necessarily those of the NHS, NIHR, or Department of Health and Social Care. None of the funding sources had a role in the design, conduct, analyses, or reporting of the study or in the decision to submit the manuscript for publication.
Publisher Copyright:
© 2022 The Author(s)
PY - 2023/2
Y1 - 2023/2
N2 - BACKGROUND AND OBJECTIVES: We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.METHODS: We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.RESULTS: We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]).CONCLUSION: Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models.SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.
AB - BACKGROUND AND OBJECTIVES: We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.METHODS: We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.RESULTS: We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]).CONCLUSION: Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models.SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.
KW - Development
KW - Diagnosis
KW - Predictive algorithm
KW - Prognosis
KW - Risk prediction
KW - Validation
UR - https://www.scopus.com/pages/publications/85144613831
U2 - 10.1016/j.jclinepi.2022.11.015
DO - 10.1016/j.jclinepi.2022.11.015
M3 - Article
C2 - 36436815
SN - 0895-4356
VL - 154
SP - 8
EP - 22
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
ER -