TY - JOUR
T1 - svMIL
T2 - Predicting the pathogenic effect of TAD boundary-disrupting somatic structural variants through multiple instance learning
AU - Nieboer, Marleen M.
AU - de Ridder, Jeroen
N1 - Publisher Copyright:
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Motivation: Despite the fact that structural variants (SVs) play an important role in cancer, methods to predict their effect, especially for SVs in non-coding regions, are lacking, leaving them often overlooked in the clinic. Non-coding SVs may disrupt the boundaries of Topologically Associated Domains (TADs), thereby affecting interactions between genes and regulatory elements such as enhancers. However, it is not known when such alterations are pathogenic. Although machine learning techniques are a promising solution to answer this question, representing the large number of interactions that an SV can disrupt in a single feature matrix is not trivial. Results: We introduce svMIL: A method to predict pathogenic TAD boundary-disrupting SV effects based on multiple instance learning, which circumvents the need for a traditional feature matrix by grouping SVs into bags that can contain any number of disruptions. We demonstrate that svMIL can predict SV pathogenicity, measured through same-sample gene expression aberration, for various cancer types. In addition, our approach reveals that somatic pathogenic SVs alter different regulatory interactions than somatic non-pathogenic SVs and germline SVs.
AB - Motivation: Despite the fact that structural variants (SVs) play an important role in cancer, methods to predict their effect, especially for SVs in non-coding regions, are lacking, leaving them often overlooked in the clinic. Non-coding SVs may disrupt the boundaries of Topologically Associated Domains (TADs), thereby affecting interactions between genes and regulatory elements such as enhancers. However, it is not known when such alterations are pathogenic. Although machine learning techniques are a promising solution to answer this question, representing the large number of interactions that an SV can disrupt in a single feature matrix is not trivial. Results: We introduce svMIL: A method to predict pathogenic TAD boundary-disrupting SV effects based on multiple instance learning, which circumvents the need for a traditional feature matrix by grouping SVs into bags that can contain any number of disruptions. We demonstrate that svMIL can predict SV pathogenicity, measured through same-sample gene expression aberration, for various cancer types. In addition, our approach reveals that somatic pathogenic SVs alter different regulatory interactions than somatic non-pathogenic SVs and germline SVs.
UR - http://www.scopus.com/inward/record.url?scp=85099208646&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btaa802
DO - 10.1093/bioinformatics/btaa802
M3 - Article
C2 - 33381833
AN - SCOPUS:85099208646
SN - 1367-4803
VL - 36
SP - I692-I699
JO - Bioinformatics
JF - Bioinformatics
ER -