Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models

Marco Piccininni*, Maximilian Wechsung, Ben Van Calster, Jessica L. Rohmann, Stefan Konigorski, Maarten van Smeden

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Objective: Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models’ internal validity in terms of calibration and discrimination performances. Methods: We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug-in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets (“naïve” models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models. Results: Across our simulations and in the real-world example, calibration of the naïve models was very poor. The models using the plug-in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance. Conclusion: Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.

Original languageEnglish
Article number104666
Number of pages10
JournalJournal of Biomedical Informatics
Volume155
DOIs
Publication statusPublished - Jul 2024

Keywords

  • Calibration
  • Class imbalance
  • Discrimination
  • Prediction
  • Undersampling

Fingerprint

Dive into the research topics of 'Understanding random resampling techniques for class imbalance correction and their consequences on calibration and discrimination of clinical risk prediction models'. Together they form a unique fingerprint.

Cite this