Abstract
The C9orf72 gene’s mutation is the most common genetic cause of amyotrophic lateral sclerosis (ALS) in Europe. This mutation is characterized by an expanded repeat of the GGGGCC sequence and is considered pathological when there are more than 30 repetitions. Being a repeat expansion, this mutation escapes detection by many genotyping studies, requiring whole genome sequencing data. This limits the ability to perform large-scale C9orf72 studies. To address this, we developed a method based on imputation to indirectly measure the C9orf72 repeat expansion length.
We estimated the true length of the expansion in 8,917 whole-genome sequenced samples from Project MinE using ExpansionHunter and built a representative reference panel with 7,000 samples. All expanded repeat lengths were collapsed to one ’30+’ allele. We masked the C9orf72 repeat in the remaining 1,917 samples and performed the imputation with Beagle 5.43. We then evaluated how well sample repeat lengths were classified as expanded or non-expanded based on a cut-off of 30. Our model’s performance was rated based on two metrics: the ‘best guess’ genotype estimated by Beagle, and the dosage of the expanded allele. If classified by the ‘best guess’ genotype, 95.6% of samples were correctly classified, and 125 out of 168 samples with pathological expansion were detected. If classified by dosage with the lowest possible threshold of 0.01, 90% of samples were correctly classified, and 142 out of 168 samples with expansions were detected. Thus, our model outperformed a previously used model of classification based on the dosage of the most significant GWAS SNP (rs2453555) that only had an accuracy of 82%.
Also, we investigated haplotype blocks in the C9orf72 region of the samples that were falsely classified as ‘expanded’. We found that their haplotypes significantly differed from the ones of truly non-expanded samples. This indicates that the false positives were not random, and haplotype analysis may be used to refine the model.
Our method also facilitated a large-scale genetic study of ALS progression. We imputed the C9orf72 region for 23,351 ALS samples, and after quality control, we kept 15,178 samples. With 1,124 (7.4%) samples considered expanded, a GWAS meta-analysis was performed on survival time but no genetic modifiers of progression were found to be significant.
We plan to impute other short tandem repeats using GWAS data to obtain the largest screen of repeat expansion in ALS to date.
We estimated the true length of the expansion in 8,917 whole-genome sequenced samples from Project MinE using ExpansionHunter and built a representative reference panel with 7,000 samples. All expanded repeat lengths were collapsed to one ’30+’ allele. We masked the C9orf72 repeat in the remaining 1,917 samples and performed the imputation with Beagle 5.43. We then evaluated how well sample repeat lengths were classified as expanded or non-expanded based on a cut-off of 30. Our model’s performance was rated based on two metrics: the ‘best guess’ genotype estimated by Beagle, and the dosage of the expanded allele. If classified by the ‘best guess’ genotype, 95.6% of samples were correctly classified, and 125 out of 168 samples with pathological expansion were detected. If classified by dosage with the lowest possible threshold of 0.01, 90% of samples were correctly classified, and 142 out of 168 samples with expansions were detected. Thus, our model outperformed a previously used model of classification based on the dosage of the most significant GWAS SNP (rs2453555) that only had an accuracy of 82%.
Also, we investigated haplotype blocks in the C9orf72 region of the samples that were falsely classified as ‘expanded’. We found that their haplotypes significantly differed from the ones of truly non-expanded samples. This indicates that the false positives were not random, and haplotype analysis may be used to refine the model.
Our method also facilitated a large-scale genetic study of ALS progression. We imputed the C9orf72 region for 23,351 ALS samples, and after quality control, we kept 15,178 samples. With 1,124 (7.4%) samples considered expanded, a GWAS meta-analysis was performed on survival time but no genetic modifiers of progression were found to be significant.
We plan to impute other short tandem repeats using GWAS data to obtain the largest screen of repeat expansion in ALS to date.
| Original language | English |
|---|---|
| Publication status | Unpublished - 2024 |
| Event | ENCALS 2024 - Duration: 17 Jun 2024 → 19 Jun 2024 https://www.encals.eu/meetings/stockholm/ |
Conference
| Conference | ENCALS 2024 |
|---|---|
| Period | 17/06/24 → 19/06/24 |
| Internet address |
Keywords
- ALS (Amyotrophic lateral sclerosis)
- Imputation
- C9orf72 Protein/genetics
Fingerprint
Dive into the research topics of 'The imputation of C9orf72 repeat expansions in GWAS data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver