TY - JOUR
T1 - Adjusting for population stratification in polygenic risk score analyses
T2 - a guide for model specifications in the UK Biobank
AU - Lin, Bochao Danae
AU - Pries, Lotta-Katrin
AU - van Os, Jim
AU - Luykx, Jurjen J
AU - Rutten, Bart P F
AU - Guloksuz, Sinan
N1 - Funding Information:
L-KP is supported by the Kootstra Talent Fellowship of Maastricht University. BPFR was funded by a VIDI award number 91718336 from the Netherlands Scientific Organisation. SG and JvO are supported by the Ophelia research project, ZonMw grant number: 636340001. SG, BPFR, and JvO are supported by the YOUTH-GEMs project, funded by the European Union’s Horizon Europe program under the grant agreement number: 101057182.
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to The Japan Society of Human Genetics.
PY - 2023/9
Y1 - 2023/9
N2 - The current study was conducted to provide a general guidance for model specifications in polygenic risk score (PRS) analyses of the UK Biobank, such as adjusting for covariates (i.e. age, sex, recruitment centers, and genetic batch) and the number of principal components (PCs) that need to be included. To cover behavioral, physical and mental health outcomes, we evaluated three continuous outcomes (BMI, smoking, drinking) and two binary outcomes (Major Depressive Disorder and educational attainment). We applied 3280 (656 per phenotype) different models including different sets of covariates. We evaluated these different model specifications by comparing regression parameters such as R2, coefficients, and P values, as well as ANOVA tests. Findings suggest that only up to three PCs appears to be sufficient for controlling population stratification for most outcomes, whereas including other covariates (particularly age and sex) appears to be more essential for model performance.
AB - The current study was conducted to provide a general guidance for model specifications in polygenic risk score (PRS) analyses of the UK Biobank, such as adjusting for covariates (i.e. age, sex, recruitment centers, and genetic batch) and the number of principal components (PCs) that need to be included. To cover behavioral, physical and mental health outcomes, we evaluated three continuous outcomes (BMI, smoking, drinking) and two binary outcomes (Major Depressive Disorder and educational attainment). We applied 3280 (656 per phenotype) different models including different sets of covariates. We evaluated these different model specifications by comparing regression parameters such as R2, coefficients, and P values, as well as ANOVA tests. Findings suggest that only up to three PCs appears to be sufficient for controlling population stratification for most outcomes, whereas including other covariates (particularly age and sex) appears to be more essential for model performance.
UR - http://www.scopus.com/inward/record.url?scp=85159348543&partnerID=8YFLogxK
U2 - 10.1038/s10038-023-01161-1
DO - 10.1038/s10038-023-01161-1
M3 - Article
C2 - 37188914
SN - 1434-5161
VL - 68
SP - 653
EP - 656
JO - Journal of Human Genetics
JF - Journal of Human Genetics
IS - 9
ER -