TY - JOUR
T1 - Risk prediction models for discrete ordinal outcomes
T2 - Calibration and the impact of the proportional odds assumption
AU - Edlinger, Michael
AU - van Smeden, Maarten
AU - Alber, Hannes F.
AU - Wanitschek, Maria
AU - Van Calster, Ben
N1 - Funding Information:
information Fonds Wetenschappelijk Onderzoek, G0B4716N; Onderzoeksraad, KU Leuven, C24M/20/064 Michael Edlinger and Ben Van Calster were supported by Research Foundation - Flanders (FWO) grant G0B4716N. BVC was supported by Internal Funds KU Leuven grant C24M/20/064. The funding bodies had no role in the design of the study, data collection, statistical analysis, interpretation of data, or in writing of the manuscript.
Funding Information:
Michael Edlinger and Ben Van Calster were supported by Research Foundation ‐ Flanders (FWO) grant G0B4716N. BVC was supported by Internal Funds KU Leuven grant C24M/20/064. The funding bodies had no role in the design of the study, data collection, statistical analysis, interpretation of data, or in writing of the manuscript.
Publisher Copyright:
© 2021 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
PY - 2022/4/15
Y1 - 2022/4/15
N2 - Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and investigated the impact of the proportional odds assumption on calibration and overfitting. We studied the multinomial, cumulative, adjacent category, continuation ratio, and stereotype logit/logistic models. To assess calibration, we investigated calibration intercepts and slopes, calibration plots, and the estimated calibration index. Using large sample simulations, we studied the performance of models for risk estimation under various conditions, assuming that the true model has either a multinomial logistic form or a cumulative logit proportional odds form. Small sample simulations were used to compare the tendency for overfitting between models. As a case study, we developed models to diagnose the degree of coronary artery disease (five categories) in symptomatic patients. When the true model was multinomial logistic, proportional odds models often yielded poor risk estimates, with calibration slopes deviating considerably from unity even on large model development datasets. The stereotype logistic model improved the calibration slope, but still provided biased risk estimates for individual patients. When the true model had a cumulative logit proportional odds form, multinomial logistic regression provided biased risk estimates, although these biases were modest. Nonproportional odds models require more parameters to be estimated from the data, and hence suffered more from overfitting. Despite larger sample size requirements, we generally recommend multinomial logistic regression for risk prediction modeling of discrete ordinal outcomes.
AB - Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and investigated the impact of the proportional odds assumption on calibration and overfitting. We studied the multinomial, cumulative, adjacent category, continuation ratio, and stereotype logit/logistic models. To assess calibration, we investigated calibration intercepts and slopes, calibration plots, and the estimated calibration index. Using large sample simulations, we studied the performance of models for risk estimation under various conditions, assuming that the true model has either a multinomial logistic form or a cumulative logit proportional odds form. Small sample simulations were used to compare the tendency for overfitting between models. As a case study, we developed models to diagnose the degree of coronary artery disease (five categories) in symptomatic patients. When the true model was multinomial logistic, proportional odds models often yielded poor risk estimates, with calibration slopes deviating considerably from unity even on large model development datasets. The stereotype logistic model improved the calibration slope, but still provided biased risk estimates for individual patients. When the true model had a cumulative logit proportional odds form, multinomial logistic regression provided biased risk estimates, although these biases were modest. Nonproportional odds models require more parameters to be estimated from the data, and hence suffered more from overfitting. Despite larger sample size requirements, we generally recommend multinomial logistic regression for risk prediction modeling of discrete ordinal outcomes.
KW - Calibration
KW - Humans
KW - Logistic Models
KW - Probability
KW - Sample Size
UR - http://www.scopus.com/inward/record.url?scp=85121006619&partnerID=8YFLogxK
U2 - 10.1002/sim.9281
DO - 10.1002/sim.9281
M3 - Article
C2 - 34897756
AN - SCOPUS:85121006619
SN - 0277-6715
VL - 41
SP - 1334
EP - 1360
JO - Statistics in Medicine
JF - Statistics in Medicine
IS - 8
ER -