Abstract
In this thesis, we focus on using driver and/or passenger mutations detected from WGS to develop machine learning classifiers for cancer diagnostics (Chapter 2 and Chapter 3), as well as to study cancer development (Chapter 4 and Chapter 5). Chapter 2 describes a Classifier of HOmologous Recombination Deficiency (CHORD) that detects HRD using various mutation types including microhomology deletions, small structural deletions and large structural duplications. Using CHORD, we found that HRD was most common in ovarian, breast, prostate and pancreatic cancer. We also found that HRD was often associated with loss-of-heterozygosity in all cancer types, with increased contribution of deep deletions in prostate cancer. Chapter 3 describes Cancer of Unknown Primary Location Resolver (CUPLR), a machine learning model that predicts tumor tissue of origin using a wide range of genomic features, including RMD, mutational signatures, driver gene mutations, aneuploidy, viral insertions, and various SV types. We found that SVs were important for and improved the performance of tumor tissue of origin classification for cancer types where SV related features were important, such as in pilocytic astrocytomas (characterized by KIAA1549-BRAF fusions) or cervical cancer (characterized by human papillomavirus DNA insertions). In Chapter 4, we use liver organoids to study mutation accumulation and its contribution to cancer development in three liver diseases, alcoholic cirrhosis, non-alcoholic steatohepatitis (NASH), and primary sclerosing cholangitis (PSC). We find surprisingly that these liver diseases do not contribute to detectable alterations in the mutation landscape. In Chapter 5, we perform a pan-cancer comparison of primary versus metastatic cancer and find that metastatic tumors differ from primary tumors mainly in their increased burden of small mutations and SVs as a result of treatment exposure and cancer type specific endogenous mutational processes. Lastly, in Chapter 6 we discuss the limitations and challenges faced in using WGS data to understand and diagnose cancer, and provide directions for future research.
Original language | English |
---|---|
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 30 Nov 2022 |
Publisher | |
Print ISBNs | 978-94-6423-953-9 |
DOIs | |
Publication status | Published - 30 Nov 2022 |
Keywords
- cancer genomics
- machine learning
- bioinformatics