TY - JOUR
T1 - Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population
AU - de Jong, Valentijn M.T.
AU - Hoogland, Jeroen
AU - Moons, Karel G.M.
AU - Riley, Richard D.
AU - Nguyen, Tri Long
AU - Debray, Thomas P.A.
N1 - Funding Information:
We thank all the reviewers for very constructive feedback that has improved the article. This project has received funding from the European Union's Horizon 2020 research and innovation programme under ReCoDID Grant agreement No. 825746.
Publisher Copyright:
© 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
PY - 2023/8/30
Y1 - 2023/8/30
N2 - External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.
AB - External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.
KW - concordance
KW - external validation
KW - prediction model
KW - propensity score
KW - standardization
UR - http://www.scopus.com/inward/record.url?scp=85163091081&partnerID=8YFLogxK
U2 - 10.1002/sim.9817
DO - 10.1002/sim.9817
M3 - Article
C2 - 37311563
AN - SCOPUS:85163091081
SN - 0277-6715
VL - 42
SP - 3508
EP - 3528
JO - Statistics in Medicine
JF - Statistics in Medicine
IS - 19
ER -