TY - JOUR
T1 - Comparable Performance Between Automatic and Manual Laryngeal and Hypopharyngeal Gross Tumor Volume Delineations Validated With Pathology
AU - Kuijer, Koen M.
AU - Smits, Hilde J.G.
AU - Doornaert, Patricia A.H.
AU - Niu, Kenan
AU - Savenije, Mark H.F.
AU - Smid, Ernst J.
AU - Terhaard, Chris H.J.
AU - Terpstra, Maarten L.
AU - de Ridder, Mischa
AU - Philippens, Marielle E.P.
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/5/1
Y1 - 2025/5/1
N2 - Purpose: Deep learning is a promising approach to increase reproducibility and time-efficiency of gross tumor volume (GTV) delineation in head and neck cancer, but model evaluation primarily relies on manual GTV delineations as reference annotation, which are subjective and tend to overestimate tumor volume. This study aimed to validate a deep learning model for laryngeal and hypopharyngeal GTV segmentation with pathology and to compare its performance with clinicians’ manual delineations. Methods and Materials: A retrospective data set of 193 patients with laryngeal and hypopharyngeal cancer was used to train a deep learning model with clinical GTV delineations as reference. For validation, a data set comprising 18 patients who underwent imaging before total laryngectomy was used, with histopathology-based (n = 16) tumor delineations as ground truth. The performance of the automatic segmentations was compared with that of clinicians’ manual delineations, both quantitatively and qualitatively. Results: Median sensitivity (0.90 and 0.91) and largest required clinical target volume margin (6.4 and 6.6 mm) were comparable between automatic and manual GTV delineations. The positive predictive value yielded the only significant difference between automatic and manual GTV delineations, with medians of 0.52 and 0.61, respectively (P = .03). Clinical target volumes derived from automatic and manual GTVs exhibited similar sizes (median of 44.5 and 40.1 mL) and achieved a sensitivity of 1.00 in 13/16 and 14/16 tumors, respectively. Automatic segmentations were considered clinically acceptable in 67% of cases, compared with 63% of manual delineations. Conclusions: The proposed deep learning model for laryngeal and hypopharyngeal GTV segmentation achieved comparable results with clinicians’ manual delineations, showing the potential for more consistency and efficiency in the radiation therapy workflow.
AB - Purpose: Deep learning is a promising approach to increase reproducibility and time-efficiency of gross tumor volume (GTV) delineation in head and neck cancer, but model evaluation primarily relies on manual GTV delineations as reference annotation, which are subjective and tend to overestimate tumor volume. This study aimed to validate a deep learning model for laryngeal and hypopharyngeal GTV segmentation with pathology and to compare its performance with clinicians’ manual delineations. Methods and Materials: A retrospective data set of 193 patients with laryngeal and hypopharyngeal cancer was used to train a deep learning model with clinical GTV delineations as reference. For validation, a data set comprising 18 patients who underwent imaging before total laryngectomy was used, with histopathology-based (n = 16) tumor delineations as ground truth. The performance of the automatic segmentations was compared with that of clinicians’ manual delineations, both quantitatively and qualitatively. Results: Median sensitivity (0.90 and 0.91) and largest required clinical target volume margin (6.4 and 6.6 mm) were comparable between automatic and manual GTV delineations. The positive predictive value yielded the only significant difference between automatic and manual GTV delineations, with medians of 0.52 and 0.61, respectively (P = .03). Clinical target volumes derived from automatic and manual GTVs exhibited similar sizes (median of 44.5 and 40.1 mL) and achieved a sensitivity of 1.00 in 13/16 and 14/16 tumors, respectively. Automatic segmentations were considered clinically acceptable in 67% of cases, compared with 63% of manual delineations. Conclusions: The proposed deep learning model for laryngeal and hypopharyngeal GTV segmentation achieved comparable results with clinicians’ manual delineations, showing the potential for more consistency and efficiency in the radiation therapy workflow.
UR - http://www.scopus.com/inward/record.url?scp=85215973983&partnerID=8YFLogxK
U2 - 10.1016/j.ijrobp.2024.12.009
DO - 10.1016/j.ijrobp.2024.12.009
M3 - Article
C2 - 39788389
AN - SCOPUS:85215973983
SN - 0360-3016
VL - 186-193
JO - International Journal of Radiation Oncology Biology Physics
JF - International Journal of Radiation Oncology Biology Physics
IS - 1
ER -