Abstract
Prediction models are a valuable tool in medical practice, as they can help in diagnosis and prognosis. In diagnosis they can be used to calculate the most likely disease or other health status for a patient. In prognosis they can predict someone’s future health status. Therefore, they can be used for medical decision making as well as for informing someone on their (future) health. Using the models’ predictions, one can make decisions on the necessity of performing additional diagnostic tests, recommending lifestyle changes, choosing other preventive strategies, identifying the most effective treatment for an individual patient and for evaluating the quality of medical centers.
These prediction models need to be developed and tested on data that is sufficiently representative of the patients in practice. However, there can be major differences between patients (and variable associations) in different settings, geographic areas and studies. This implies that a developed prediction model might not perform consistently well when applied in routine care. To study this, we combine data from patients from different settings, areas and studies, which requires more sophisticated statistical methods.
In this thesis I have evaluated existing and developed new statistical methods for analyzing data from multiple sources, with a particular emphasis on the development and testing of prediction models for diagnosis and prognosis. With the methods presented in this thesis I aimed to facilitate prediction model research by better accounting for between-study heterogeneity.
In chapter 2, I review and discuss methods for pooling treatment effects while accounting differences in baseline risk across studies. I describe how the between-study heterogeneity can be investigated and possibly resolved.
In chapter 3, I describe how a prediction model’s generalizability to other populations and settings (rather than reproducibility in similar ones) can be incorporated into the model development process, when data from multiple sources are available.
In chapter 4, I demonstrate that when a prediction model is validated in external data where the predictor distributions differ from the target population, the performance estimates can be standardized with propensity methods, so that they can be interpreted in light of the target population.
In chapter 5, I have developed a novel method that accounts for (differences in) misclassification across studies, while simultaneously accounting for personal characteristics. Motivated by real data, I have used simulation studies to demonstrate that, when used appropriately, the method produces unbiased estimates and has better coverage than standard methods which only use correctly classified observations or which naively use incorrectly classified data.
In chapter 6, I have evaluated estimation methods for prediction models through the use of an illustrative example and extensive simulation studies. I have explored minimum sample size criteria for prediction model development, and provide guidance for applied researchers.
Finally, in chapter 7, I have provided an overview of methods for prognostic research using multiple data sets. I have described methods for prognostic factor and prognostic model research for when individual participant data, aggregate data or a combination thereof are available.
These prediction models need to be developed and tested on data that is sufficiently representative of the patients in practice. However, there can be major differences between patients (and variable associations) in different settings, geographic areas and studies. This implies that a developed prediction model might not perform consistently well when applied in routine care. To study this, we combine data from patients from different settings, areas and studies, which requires more sophisticated statistical methods.
In this thesis I have evaluated existing and developed new statistical methods for analyzing data from multiple sources, with a particular emphasis on the development and testing of prediction models for diagnosis and prognosis. With the methods presented in this thesis I aimed to facilitate prediction model research by better accounting for between-study heterogeneity.
In chapter 2, I review and discuss methods for pooling treatment effects while accounting differences in baseline risk across studies. I describe how the between-study heterogeneity can be investigated and possibly resolved.
In chapter 3, I describe how a prediction model’s generalizability to other populations and settings (rather than reproducibility in similar ones) can be incorporated into the model development process, when data from multiple sources are available.
In chapter 4, I demonstrate that when a prediction model is validated in external data where the predictor distributions differ from the target population, the performance estimates can be standardized with propensity methods, so that they can be interpreted in light of the target population.
In chapter 5, I have developed a novel method that accounts for (differences in) misclassification across studies, while simultaneously accounting for personal characteristics. Motivated by real data, I have used simulation studies to demonstrate that, when used appropriately, the method produces unbiased estimates and has better coverage than standard methods which only use correctly classified observations or which naively use incorrectly classified data.
In chapter 6, I have evaluated estimation methods for prediction models through the use of an illustrative example and extensive simulation studies. I have explored minimum sample size criteria for prediction model development, and provide guidance for applied researchers.
Finally, in chapter 7, I have provided an overview of methods for prognostic research using multiple data sets. I have described methods for prognostic factor and prognostic model research for when individual participant data, aggregate data or a combination thereof are available.
Original language | English |
---|---|
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 8 Dec 2020 |
Publisher | |
Print ISBNs | 978-90-393-7261-6 |
DOIs | |
Publication status | Published - 8 Dec 2020 |
Keywords
- Individual participant data
- Meta-analysis
- Clustered data
- Statistics
- Methods
- Prediction
- Heterogeneity
- Measurement error
- Internal-External Cross-Validation
- Prognosis