Abstract
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.
Original language | English |
---|---|
Pages (from-to) | 271-281.e7 |
Journal | Cell Systems |
Volume | 6 |
Issue number | 3 |
DOIs | |
Publication status | Published - 28 Mar 2018 |
Keywords
- large-scale
- open science
- pan-cancer
- PanCanAtlas project
- reproducible computing
- somatic mutation calling
- TCGA
Access to Document
Other files and links
Fingerprint
Dive into the research topics of 'Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Cell Systems, Vol. 6, No. 3, 28.03.2018, p. 271-281.e7.
Research output: Contribution to journal › Article › Academic › peer-review
TY - JOUR
T1 - Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines
AU - Ellrott, Kyle
AU - Bailey, Matthew H.
AU - Saksena, Gordon
AU - Covington, Kyle R.
AU - Kandoth, Cyriac
AU - Stewart, Chip
AU - Hess, Julian
AU - Ma, Singer
AU - Chiotti, Kami E.
AU - McLellan, Michael
AU - Sofia, Heidi J.
AU - Hutter, Carolyn M.
AU - Getz, Gad
AU - Wheeler, David A.
AU - Ding, Li
AU - Caesar-Johnson, Samantha J.
AU - Demchok, John A.
AU - Felau, Ina
AU - Kasapi, Melpomeni
AU - Ferguson, Martin L.
AU - Hutter, Carolyn M.
AU - Sofia, Heidi J.
AU - Tarnuzzer, Roy
AU - Wang, Zhining
AU - Yang, Liming
AU - Zenklusen, Jean C.
AU - Zhang, Jiashan (Julia)
AU - Chudamani, Sudha
AU - Liu, Jia
AU - Lolla, Laxmi
AU - Naresh, Rashi
AU - Pihl, Todd
AU - Sun, Qiang
AU - Wan, Yunhu
AU - Wu, Ye
AU - Cho, Juok
AU - DeFreitas, Timothy
AU - Frazer, Scott
AU - Gehlenborg, Nils
AU - Getz, Gad
AU - Heiman, David I.
AU - Kim, Jaegil
AU - Lawrence, Michael S.
AU - Lin, Pei
AU - Meier, Sam
AU - Noble, Michael S.
AU - Saksena, Gordon
AU - Voet, Doug
AU - Zhang, Hailei
AU - de Krijger, Ronald
N1 - Funding Information: The authors would like to acknowledge contributions of all of the members of the TCGA network. This work was funded by grants from to the UC Santa Cruz Genomics Institute and supported by the NIH NHGRI (grant no U54HG007990 ), NCI ( U24CA143858 ), and NCI ITCR (grant no R01CA180778 ); Oregon Health and Science University from the NCI ( U24CA210957 , U24CA143799 ); M.L.A. grants from NCI ( R01CA183793 , U24CA210950 , U24CA210949 , U24CA143883 , CA150252 , and U24CA143845 ) and Keck Center of the Golf Coast Consortia for the Cancer Biology Training Program CPRIT ( RP140113 ); NCI and NHGRI grants to McDonnell Genome Institute at Washington University ( U24CA211006 , U01HG006517 , and U54HG003079 ); NCI grants to the Institute of Systems Biology ( U24CA143835 ); NHGRI grants to the Broad Institute ( U54HG003067 ); NCI and NHGRI grants to the Baylor College of Medicine ( U54HG003273 and U24CA143843 ); NCI grants to MSKCC ( U24CA143840 ); NCI grants to M.L.A. ( U24CA143845 , P30CA016672 , and U24CA143883 ); NCI grants to University of North Carolina ( U24CA143848 ); NCI grants to the Van Andel Research Institute ( U24CA143882 ); NCI grants to Partners Healthcare ( U24CA144025 ); NCI grants to Harvard ( U24CA143867 ); and NCI grants to BC Cancer Foundation ( U24CA143866 ). In addition, the author would like to thank DNAnexus for the compute processes they provided for the variant calling; OxoG calculations in this study was performed on the Institute for Systems Biology-Cancer Genomics Cloud (ISB-CGC), a pilot project of the National Cancer Institute (under contract number HHSN261201400007C) and the Broad Institute for validation data processing. The author would like to acknowledge the many individuals that contributed to the success of the MC3 project, including Liu Xi, Walker Hale, Katayoon Kasaian, Ignaty Leshchiner, Yifei Men, Sheila M. Reynolds, Gordon B. Mills, John N. Weinstein, Rehan Akbani, Wenyi Wang, Yu Fan, Melpomeni Kasapi, Adam Struck, Alex Buchanan, Allison Creason, John Letaw, Myron Peto, Pavana Anur, Amie Radenbaugh, Christopher K. Wong, David Haussler, Joshua M. Stuart, Beifang Niu, Dave Larson, Steven Foltz and Kai Ye. Funding Information: The authors would like to acknowledge contributions of all of the members of the TCGA network. This work was funded by grants from to the UC Santa Cruz Genomics Institute and supported by the NIH NHGRI (grant no U54HG007990), NCI (U24CA143858), and NCI ITCR (grant no R01CA180778); Oregon Health and Science University from the NCI (U24CA210957, U24CA143799); M.L.A. grants from NCI (R01CA183793, U24CA210950, U24CA210949, U24CA143883, CA150252, and U24CA143845) and Keck Center of the Golf Coast Consortia for the Cancer Biology Training Program CPRIT (RP140113); NCI and NHGRI grants to McDonnell Genome Institute at Washington University (U24CA211006, U01HG006517, and U54HG003079); NCI grants to the Institute of Systems Biology (U24CA143835); NHGRI grants to the Broad Institute (U54HG003067); NCI and NHGRI grants to the Baylor College of Medicine (U54HG003273 and U24CA143843); NCI grants to MSKCC (U24CA143840); NCI grants to M.L.A. (U24CA143845, P30CA016672, and U24CA143883); NCI grants to University of North Carolina (U24CA143848); NCI grants to the Van Andel Research Institute (U24CA143882); NCI grants to Partners Healthcare (U24CA144025); NCI grants to Harvard (U24CA143867); and NCI grants to BC Cancer Foundation (U24CA143866). In addition, the author would like to thank DNAnexus for the compute processes they provided for the variant calling; OxoG calculations in this study was performed on the Institute for Systems Biology-Cancer Genomics Cloud (ISB-CGC), a pilot project of the National Cancer Institute (under contract number HHSN261201400007C) and the Broad Institute for validation data processing. The author would like to acknowledge the many individuals that contributed to the success of the MC3 project, including Liu Xi, Walker Hale, Katayoon Kasaian, Ignaty Leshchiner, Yifei Men, Sheila M. Reynolds, Gordon B. Mills, John N. Weinstein, Rehan Akbani, Wenyi Wang, Yu Fan, Melpomeni Kasapi, Adam Struck, Alex Buchanan, Allison Creason, John Letaw, Myron Peto, Pavana Anur, Amie Radenbaugh, Christopher K. Wong, David Haussler, Joshua M. Stuart, Beifang Niu, Dave Larson, Steven Foltz and Kai Ye. Funding Information: Michael Seiler, Peter G. Smith, Ping Zhu, Silvia Buonamici, and Lihua Yu are employees of H3 Biomedicine, Inc. Parts of this work are the subject of a patent application: WO2017040526 titled “Splice variants associated with neomorphic sf3b1 mutants.” Shouyoung Peng, Anant A. Agrawal, James Palacino, and Teng Teng are employees of H3 Biomedicine, Inc. Andrew D. Cherniack, Ashton C. Berger, and Galen F. Gao receive research support from Bayer Pharmaceuticals. Gordon B. Mills serves on the External Scientific Review Board of Astrazeneca. Anil Sood is on the Scientific Advisory Board for Kiyatec and is a shareholder in BioPath. Jonathan S. Serody receives funding from Merck, Inc. Kyle R. Covington is an employee of Castle Biosciences, Inc. Preethi H. Gunaratne is founder, CSO, and shareholder of NextmiRNA Therapeutics. Christina Yau is a part-time employee/consultant at NantOmics. Franz X. Schaub is an employee and shareholder of SEngine Precision Medicine, Inc. Carla Grandori is an employee, founder, and shareholder of SEngine Precision Medicine, Inc. Robert N. Eisenman is a member of the Scientific Advisory Boards and shareholder of Shenogen Pharma and Kronos Bio. Daniel J. Weisenberger is a consultant for Zymo Research Corporation. Joshua M. Stuart is the founder of Five3 Genomics and shareholder of NantOmics. Marc T. Goodman receives research support from Merck, Inc. Andrew J. Gentles is a consultant for Cibermed. Charles M. Perou is an equity stock holder, consultant, and Board of Directors member of BioClassifier and GeneCentric Diagnostics and is also listed as an inventor on patent applications on the Breast PAM50 and Lung Cancer Subtyping assays. Matthew Meyerson receives research support from Bayer Pharmaceuticals; is an equity holder in, consultant for, and Scientific Advisory Board chair for OrigiMed; and is an inventor of a patent for EGFR mutation diagnosis in lung cancer, licensed to LabCorp. Eduard Porta-Pardo is an inventor of a patent for domainXplorer. Han Liang is a shareholder and scientific advisor of Precision Scientific and Eagle Nebula. Da Yang is an inventor on a pending patent application describing the use of antisense oligonucleotides against specific lncRNA sequence as diagnostic and therapeutic tools. Yonghong Xiao was an employee and shareholder of TESARO, Inc. Bin Feng is an employee and shareholder of TESARO, Inc. Carter Van Waes received research funding for the study of IAP inhibitor ASTX660 through a Cooperative Agreement between NIDCD, NIH, and Astex Pharmaceuticals. Raunaq Malhotra is an employee and shareholder of Seven Bridges, Inc. Peter W. Laird serves on the Scientific Advisory Board for AnchorDx. Joel Tepper is a consultant at EMD Serono. Kenneth Wang serves on the Advisory Board for Boston Scientific, Microtech, and Olympus. Andrea Califano is a founder, shareholder, and advisory board member of DarwinHealth, Inc. and a shareholder and advisory board member of Tempus, Inc. Toni K. Choueiri serves as needed on advisory boards for Bristol-Myers Squibb, Merck, and Roche. Lawrence Kwong receives research support from Array BioPharma. Sharon E. Plon is a member of the Scientific Advisory Board for Baylor Genetics Laboratory. Beth Y. Karlan serves on the Advisory Board of Invitae. Publisher Copyright: © 2018 The Authors
PY - 2018/3/28
Y1 - 2018/3/28
N2 - The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.
AB - The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.
KW - large-scale
KW - open science
KW - pan-cancer
KW - PanCanAtlas project
KW - reproducible computing
KW - somatic mutation calling
KW - TCGA
UR - http://www.scopus.com/inward/record.url?scp=85044569292&partnerID=8YFLogxK
U2 - 10.1016/j.cels.2018.03.002
DO - 10.1016/j.cels.2018.03.002
M3 - Article
AN - SCOPUS:85044569292
SN - 2405-4712
VL - 6
SP - 271-281.e7
JO - Cell Systems
JF - Cell Systems
IS - 3
ER -