TY - JOUR
T1 - Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
AU - Fuady, Angga M
AU - van Roon-Mom, Willeke M C
AU - Kiełbasa, Szymon M
AU - Uh, Hae-Won
AU - Houwing-Duistermaat, Jeanine J
N1 - Funding Information:
This work was supported by Indonesian Endowment Fund for Education (LPDP), Ministry of Finance, Indonesia, the European Union's Horizon 2020 grants IMforFUTURE (grant agreement No. 721815), the European Union's Seventh Framework Programme FP7‐Health‐F5‐2012 MIMOmics (grant agreement No. 305280), Centre for Medical Systems Biology within the framework of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research and Dutch Centre for Biomedical Genetics and the European Community's Seventh Framework Programme (FP7/2007‐2013) (grant agreement no. 2012‐305121) ‘Integrated European ‐omics research project for diagnosis and therapy in rare neuromuscular and neurodegenerative diseases (NEUROMICS)’.
Funding Information:
This work was supported by Indonesian Endowment Fund for Education (LPDP), Ministry of Finance, Indonesia, the European Union's Horizon 2020 grants IMforFUTURE (grant agreement No. 721815), the European Union's Seventh Framework Programme FP7-Health-F5-2012 MIMOmics (grant agreement No. 305280), Centre for Medical Systems Biology within the framework of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research and Dutch Centre for Biomedical Genetics and the European Community's Seventh Framework Programme (FP7/2007-2013) (grant agreement no. 2012-305121) ?Integrated European -omics research project for diagnosis and therapy in rare neuromuscular and neurodegenerative diseases (NEUROMICS)?.
Publisher Copyright:
© 2020 The Authors. Biometrical Journal published by Wiley-VCH GmbH.
PY - 2021/4
Y1 - 2021/4
N2 - Advancement of gene expression measurements in longitudinal studies enables the identification of genes associated with disease severity over time. However, problems arise when the technology used to measure gene expression differs between time points. Observed differences between the results obtained at different time points can be caused by technical differences. Modeling the two measurements jointly over time might provide insight into the causes of these different results. Our work is motivated by a study of gene expression data of blood samples from Huntington disease patients, which were obtained using two different sequencing technologies. At time point 1, DeepSAGE technology was used to measure the gene expression, with a subsample also measured using RNA-Seq technology. At time point 2, all samples were measured using RNA-Seq technology. Significant associations between gene expression measured by DeepSAGE and disease severity using data from the first time point could not be replicated by the RNA-Seq data from the second time point. We modeled the relationship between the two sequencing technologies using the data from the overlapping samples. We used linear mixed models with either DeepSAGE or RNA-Seq measurements as the dependent variable and disease severity as the independent variable. In conclusion, (1) for one out of 14 genes, the initial significant result could be replicated with both technologies using data from both time points; (2) statistical efficiency is lost due to disagreement between the two technologies, measurement error when predicting gene expressions, and the need to include additional parameters to account for possible differences.
AB - Advancement of gene expression measurements in longitudinal studies enables the identification of genes associated with disease severity over time. However, problems arise when the technology used to measure gene expression differs between time points. Observed differences between the results obtained at different time points can be caused by technical differences. Modeling the two measurements jointly over time might provide insight into the causes of these different results. Our work is motivated by a study of gene expression data of blood samples from Huntington disease patients, which were obtained using two different sequencing technologies. At time point 1, DeepSAGE technology was used to measure the gene expression, with a subsample also measured using RNA-Seq technology. At time point 2, all samples were measured using RNA-Seq technology. Significant associations between gene expression measured by DeepSAGE and disease severity using data from the first time point could not be replicated by the RNA-Seq data from the second time point. We modeled the relationship between the two sequencing technologies using the data from the overlapping samples. We used linear mixed models with either DeepSAGE or RNA-Seq measurements as the dependent variable and disease severity as the independent variable. In conclusion, (1) for one out of 14 genes, the initial significant result could be replicated with both technologies using data from both time points; (2) statistical efficiency is lost due to disagreement between the two technologies, measurement error when predicting gene expressions, and the need to include additional parameters to account for possible differences.
KW - DeepSAGE
KW - linear mixed model
KW - measurement error
KW - quality control
KW - RNA-Seq
UR - http://www.scopus.com/inward/record.url?scp=85097889063&partnerID=8YFLogxK
U2 - 10.1002/bimj.201900235
DO - 10.1002/bimj.201900235
M3 - Article
C2 - 33350510
SN - 0323-3847
VL - 63
SP - 745
EP - 760
JO - Biometrical Journal
JF - Biometrical Journal
IS - 4
ER -