A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data

L. Wynants*, W. Bouwmeester, K. G. M. Moons, M. Moerbeek, D. Timmerman, S. Van Huffel, B. Van Calster, Y. Vergouwe

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Objectives: This study aims to investigate the influence of the amount of clustering [intraclass correlation (ICC) = 0%, 5%, or 20%], the number of events per variable (EPV) or candidate predictor (EPV = 5, 10, 20, or 50), and backward variable selection on the performance of prediction models.

Study Design and Setting: Researchers frequently combine data from several centers to develop clinical prediction models. In our simulation study, we developed models from clustered training data using multilevel logistic regression and validated them in external data.

Results: The amount of clustering was not meaningfully associated with the models' predictive performance. The median calibration slope of models built in samples with EPV = 5 and strong clustering (ICC = 20%) was 0.71. With EPV = 5 and ICC = 0%, it was 0.72. A higher EPV related to an increased performance: the calibration slope was 0.85 at EPV = 10 and ICC = 20% and 0.96 at EPV = 50 and ICC = 20%. Variable selection sometimes led to a substantial relative bias in the estimated predictor effects (up to 118% at EPV = 5), but this had little influence on the model's performance in our simulations.

Conclusion: We recommend at least 10 EPV to fit prediction models in clustered data using logistic regression. Up to 50 EPV may be needed when variable selection is performed. (C) 2015 Elsevier Inc. All rights reserved.

Original languageEnglish
Pages (from-to)1406-1414
Number of pages9
JournalJournal of Clinical Epidemiology
Volume68
Issue number12
DOIs
Publication statusPublished - Dec 2015

Keywords

  • Clustered data
  • Multicenter study
  • Events per variable
  • Logistic model
  • Prediction model
  • Simulation study
  • LOGISTIC-REGRESSION ANALYSIS
  • TUMOR-ANALYSIS IOTA
  • SMALL DATA SETS
  • MULTICENTER
  • SELECTION

Fingerprint

Dive into the research topics of 'A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data'. Together they form a unique fingerprint.

Cite this