TY - JOUR
T1 - Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer
AU - Bejnordi, Babak Ehteshami
AU - Veta, Mitko
AU - Van Diest, Paul Johannes
AU - Van Ginneken, Bram
AU - Karssemeijer, Nico
AU - Litjens, Geert
AU - van der Laak, Jeroen A W M
AU - Hermsen, Meyke
AU - Manson, Quirine F.
AU - Balkenhol, Maschenka
AU - Geessink, Oscar
AU - Stathonikos, Nikolaos
AU - van Dijk, Marcory C R F
AU - Bult, Peter
AU - Beca, Francisco
AU - Beck, Andrew H.
AU - Wang, Dayong
AU - Khosla, Aditya
AU - Gargeya, Rishab
AU - Irshad, Humayun
AU - Zhong, Aoxiao
AU - Dou, Qi
AU - Li, Quanzheng
AU - Chen, Hao
AU - Lin, Huang Jing
AU - Heng, Pheng-Ann
AU - Haß, Christian
AU - Bruni, Elia
AU - Wong, Quincy
AU - Halici, Ugur
AU - Öner, Mustafa Ümit
AU - Cetin-Atalay, Rengul
AU - Berseth, Matt
AU - Khvatkov, Vitali
AU - Vylegzhanin, Alexei
AU - Kraus, Oren
AU - Shaban, Muhammad
AU - Rajpoot, Nasir M.
AU - Awan, Ruqayya
AU - Sirinukunwattana, Korsuk
AU - Qaiser, Talha
AU - Tsang, Yee Wah
AU - Tellez, David
AU - Annuscheit, Jonas
AU - Hufnagl, Peter
AU - Valkonen, Mira
AU - Kartasalo, Kimmo
AU - Latonen, Leena
AU - Ruusuvuori, Pekka
AU - Liimatainen, Kaisa
N1 - Funding Information:
completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Veta reported receiving grant funding from Netherlands Organization for Scientific Research. Dr van Ginneken reported being a co-founder of and holding shares from Thirona and receiving grant funding and royalties from Mevis Medical Solutions. Dr Karssemeijer reported receiving holding shares in Volpara Solutions, QView Medical, and ScreenPoint Medical BV; consulting fees from QView Medical; and being an employee of ScreenPoint Medical BV. Dr van der Laak reported receiving personal fees from Philips, ContextVision, and Diagnostic Services Manitoba. Dr Manson reported receiving grant funding from Dutch Cancer Society. Mr Geessink reported receiving grant funding from Dutch Cancer Society. Dr Beca reported receiving personal fees from PathAI and Nvidia and owning stock in Nvidia. Dr Li reported receiving grant funding from the National Institutes of Health. Dr Ruusuvuori reported receiving grant funding from Finnish Funding Agency for Innovation. No other disclosures were reported.
Funding Information:
were funded by Stichting IT Projecten and by the Fonds Economische Structuurversterking (tEPIS/TRAIT project; LSH-FES Program 2009; DFES1029161 and FES1103JJT8U). Fonds Economische Structuurversterking also supported (in kind) web-access to whole-slide images. This work was supported by grant 601040 from the Seventh Framework Programme for Research–funded VPH-PRISM project of the European Union (Mr Ehteshami Bejnordi).
Publisher Copyright:
© 2017 American Medical Association. All rights reserved.
PY - 2017/12/12
Y1 - 2017/12/12
N2 - Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency.Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting.Design, Setting, and Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC).Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation.Main Outcomes and Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor.Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC).Conclusions and Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
AB - Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency.Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting.Design, Setting, and Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC).Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation.Main Outcomes and Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor.Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC).Conclusions and Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
KW - Algorithms
KW - Breast Neoplasms/pathology
KW - Female
KW - Humans
KW - Lymphatic Metastasis/diagnosis
KW - Machine Learning
KW - Pathologists
KW - Pathology, Clinical
KW - ROC Curve
UR - http://www.scopus.com/inward/record.url?scp=85038431889&partnerID=8YFLogxK
U2 - 10.1001/jama.2017.14585
DO - 10.1001/jama.2017.14585
M3 - Article
C2 - 29234806
AN - SCOPUS:85038431889
SN - 0098-7484
VL - 318
SP - 2199
EP - 2210
JO - JAMA - The Journal of The American Medical Association
JF - JAMA - The Journal of The American Medical Association
IS - 22
ER -