Abstract
In human disease studies, it has become common to collect multiple omics datasets measured on various molecular levels. The aim is to study the underlying mechanisms of disease from different perspectives by jointly analyzing these datasets. This thesis develops statistical methodologies to model a disease outcome with two omics datasets. We consider latent variable methods for constructing low-dimensional components representing the two omics, and linear models for associating the components to a disease. The latent variable methods address the statistical challenges of high dimensionality, correlations within and between omics, and systematic differences between datasets. The linear models provide flexibility for various study designs and different distributions of disease outcomes. Both two-stage methods where latent variable model and linear model are fitted separately and one-stage methods where the two are fitted simultaneously are developed. The two-stage methods are computationally fast and offer more flexibility in the linear models, while the one-stage models provide unbiased inference results. The methods are all validated and can be used in a wide range of disease studies.
Original language | English |
---|---|
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 22 May 2023 |
Place of Publication | Utrecht |
Publisher | |
Print ISBNs | 978-94-6483-123-8 |
DOIs | |
Publication status | Published - 22 May 2023 |
Keywords
- Omics research
- Omics heterogeneity
- Data integration
- High dimensional statistics
- Dimension reduction
- Partial least squares
- Two-stage modelling
- Joint modelling
- Generalized linear models
- GLM-PO2PLS