TY - JOUR
T1 - 1399 H&E-stained sentinel lymph node sections of breast cancer patients
T2 - The CAMELYON dataset
AU - Litjens, Geert
AU - Bandi, Peter
AU - Bejnordi, Babak Ehteshami
AU - Geessink, Oscar
AU - Balkenhol, Maschenka
AU - Bult, Peter
AU - Halilovic, Altuna
AU - Hermsen, Meyke
AU - van de Loo, Rob
AU - Vogels, Rob
AU - Manson, Quirine F.
AU - Stathonikos, Nikolas
AU - Baidoshvili, Alexi
AU - van Diest, Paul
AU - Wauters, Carla
AU - van Dijk, Marcory
AU - van der Laak, Jeroen
N1 - Funding Information:
Data collection and annotation where funded by Stichting IT Projecten and by the Fonds Economische Structuurversterking (tEPIS/TRAIT project; LSH-FES Program 2009; DFES1029161 and FES1103JJTBU). This work was also supported by grant 601040 from the FP7-funded VPH-PRISM project of the European Union.
Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press.
PY - 2018/5/31
Y1 - 2018/5/31
N2 - Background: The presence of lymph node metastases is one of the most important factors in breast cancer prognosis. The most common way to assess regional lymph node status is the sentinel lymph node procedure. The sentinel lymph node is the most likely lymph node to contain metastasized cancer cells and is excised, histopathologically processed, and examined by a pathologist. This tedious examination process is time-consuming and can lead to small metastases being missed. However, recent advances in whole-slide imaging and machine learning have opened an avenue for analysis of digitized lymph node sections with computer algorithms. For example, convolutional neural networks, a type of machine-learning algorithm, can be used to automatically detect cancer metastases in lymph nodes with high accuracy. To train machine-learning models, large, well-curated datasets are needed. Results: We released a dataset of 1,399 annotated whole-slide images (WSIs) of lymph nodes, both with and without metastases, in 3 terabytes of data in the context of the CAMELYON16 and CAMELYON17 Grand Challenges. Slides were collected from five medical centers to cover a broad range of image appearance and staining variations. Each WSI has a slide-level label indicating whether it contains no metastases, macro-metastases, micro-metastases, or isolated tumor cells. Furthermore, for 209 WSIs, detailed hand-drawn contours for all metastases are provided. Last, open-source software tools to visualize and interact with the data have been made available. Conclusions: A unique dataset of annotated, whole-slide digital histopathology images has been provided with high potential for re-use.
AB - Background: The presence of lymph node metastases is one of the most important factors in breast cancer prognosis. The most common way to assess regional lymph node status is the sentinel lymph node procedure. The sentinel lymph node is the most likely lymph node to contain metastasized cancer cells and is excised, histopathologically processed, and examined by a pathologist. This tedious examination process is time-consuming and can lead to small metastases being missed. However, recent advances in whole-slide imaging and machine learning have opened an avenue for analysis of digitized lymph node sections with computer algorithms. For example, convolutional neural networks, a type of machine-learning algorithm, can be used to automatically detect cancer metastases in lymph nodes with high accuracy. To train machine-learning models, large, well-curated datasets are needed. Results: We released a dataset of 1,399 annotated whole-slide images (WSIs) of lymph nodes, both with and without metastases, in 3 terabytes of data in the context of the CAMELYON16 and CAMELYON17 Grand Challenges. Slides were collected from five medical centers to cover a broad range of image appearance and staining variations. Each WSI has a slide-level label indicating whether it contains no metastases, macro-metastases, micro-metastases, or isolated tumor cells. Furthermore, for 209 WSIs, detailed hand-drawn contours for all metastases are provided. Last, open-source software tools to visualize and interact with the data have been made available. Conclusions: A unique dataset of annotated, whole-slide digital histopathology images has been provided with high potential for re-use.
KW - Breast cancer
KW - Grand challenge
KW - Lymph node metastases
KW - Sentinel node
KW - Whole-slide images
KW - sentinel node
KW - breast cancer
KW - whole-slide images
KW - grand challenge
KW - lymph node metastases
KW - Breast Neoplasms/pathology
KW - Humans
KW - Databases as Topic
KW - Lymphatic Metastasis/pathology
KW - Algorithms
KW - Staining and Labeling
KW - Female
KW - Sentinel Lymph Node/pathology
KW - Neoplasm Staging
UR - http://www.scopus.com/inward/record.url?scp=85050878832&partnerID=8YFLogxK
U2 - 10.1093/gigascience/giy065
DO - 10.1093/gigascience/giy065
M3 - Article
C2 - 29860392
AN - SCOPUS:85050878832
VL - 7
JO - GigaScience
JF - GigaScience
IS - 6
M1 - giy065
ER -