TY - JOUR
T1 - Accurate and fast multiple-testing correction in eQTL studies
AU - Sul, Jae Hoon
AU - Raj, Towfique
AU - de Jong, Simone
AU - de Bakker, Paul I W
AU - Raychaudhuri, Soumya
AU - Ophoff, Roel A
AU - Stranger, Barbara E
AU - Eskin, Eleazar
AU - Han, Buhm
N1 - Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
PY - 2015/6/4
Y1 - 2015/6/4
N2 - In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.
AB - In studies of expression quantitative trait loci (eQTLs), it is of increasing interest to identify eGenes, the genes whose expression levels are associated with variation at a particular genetic variant. Detecting eGenes is important for follow-up analyses and prioritization because genes are the main entities in biological processes. To detect eGenes, one typically focuses on the genetic variant with the minimum p value among all variants in cis with a gene and corrects for multiple testing to obtain a gene-level p value. For performing multiple-testing correction, a permutation test is widely used. Because of growing sample sizes of eQTL studies, however, the permutation test has become a computational bottleneck in eQTL studies. In this paper, we propose an efficient approach for correcting for multiple testing and assess eGene p values by utilizing a multivariate normal distribution. Our approach properly takes into account the linkage-disequilibrium structure among variants, and its time complexity is independent of sample size. By applying our small-sample correction techniques, our method achieves high accuracy in both small and large studies. We have shown that our method consistently produces extremely accurate p values (accuracy > 98%) for three human eQTL datasets with different sample sizes and SNP densities: the Genotype-Tissue Expression pilot dataset, the multi-region brain dataset, and the HapMap 3 dataset.
KW - Data Interpretation, Statistical
KW - Gene Expression Regulation
KW - Genes
KW - Genetic Variation
KW - Humans
KW - Multivariate Analysis
KW - Normal Distribution
KW - Polymorphism, Single Nucleotide
KW - Probability
KW - Quantitative Trait Loci
KW - Sample Size
KW - Statistics, Nonparametric
UR - http://www.scopus.com/inward/record.url?scp=84930024854&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2015.04.012
DO - 10.1016/j.ajhg.2015.04.012
M3 - Article
C2 - 26027500
SN - 0002-9297
VL - 96
SP - 857
EP - 868
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 6
ER -