TY - JOUR
T1 - When accurate prediction models yield harmful self-fulfilling prophecies
AU - van Amsterdam, Wouter A.C.
AU - van Geloven, Nan
AU - Krijthe, Jesse H.
AU - Ranganath, Rajesh
AU - Cinà, Giovanni
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/4/11
Y1 - 2025/4/11
N2 - Prediction models are popular in medical research and practice. Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions, and they are frequently lauded as instruments for personalized, data-driven healthcare. We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients, but the worse outcome of these patients does not diminish the discrimination of the model. Our main result is a formal characterization of a set of such prediction models. Next, we show that models that are well calibrated before and after deployment are useless for decision-making, as they make no change in the data distribution. These results call for a reconsideration of standard practices for validation and deployment of prediction models that are used in medical decisions.
AB - Prediction models are popular in medical research and practice. Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions, and they are frequently lauded as instruments for personalized, data-driven healthcare. We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients, but the worse outcome of these patients does not diminish the discrimination of the model. Our main result is a formal characterization of a set of such prediction models. Next, we show that models that are well calibrated before and after deployment are useless for decision-making, as they make no change in the data distribution. These results call for a reconsideration of standard practices for validation and deployment of prediction models that are used in medical decisions.
KW - causal inference
KW - data drift
KW - decision support techniques
KW - deployment
KW - monitoring
KW - prognosis
UR - http://www.scopus.com/inward/record.url?scp=105002303938&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2025.101229
DO - 10.1016/j.patter.2025.101229
M3 - Article
AN - SCOPUS:105002303938
VL - 6
JO - Patterns
JF - Patterns
IS - 4
M1 - 101229
ER -