TY - JOUR
T1 - Clinical assessment of deep learning-based uncertainty maps in lung cancer segmentation
AU - Maruccio, Federica Carmen
AU - Eppinga, Wietse S C
AU - Laves, Max-Heinrich
AU - Fonolla Navarro, Roger
AU - Salvi, Massimo
AU - Molinari, Filippo
AU - Papaconstadopoulos, Pavlos
N1 - Publisher Copyright:
© 2024 Institute of Physics and Engineering in Medicine.
PY - 2024/2/7
Y1 - 2024/2/7
N2 - OBJECTIVE: Prior to radiation therapy planning, accurate delineation of gross tumour volume (GTVs) and organs at risk (OARs) is crucial. In the current clinical practice, tumour delineation is performed manually by radiation oncologists, which is time-consuming and prone to large inter-observer variability. With the advent of deep learning (DL) models, automated contouring has become possible, speeding up procedures and assisting clinicians. However, these tools are currently used in the clinic mostly for contouring OARs, since these systems are not reliable yet for contouring GTVs. To improve the reliability of these systems, researchers have started exploring the topic of probabilistic neural networks. However, there is still limited knowledge of the practical implementation of such networks in real clinical settings.APPROACH: In this work, we developed a 3D probabilistic system that generates DL-based uncertainty maps for lung cancer CT segmentations. We employed the Monte Carlo (MC) dropout technique to generate probabilistic and uncertainty maps, while the model calibration was evaluated by using reliability diagrams. A clinical validation was conducted in collaboration with a radiation oncologist to qualitatively assess the value of the uncertainty estimates. We also proposed two novel metrics, namely mean uncertainty (MU) and relative uncertainty volume (RUV), as potential indicators for clinicians to assess the need for independent visual checks of the DL-based segmentation.MAIN RESULTS: Our study showed that uncertainty mapping effectively identified cases of under or over-contouring. Although the overconfidence of the model, a strong correlation was observed between the clinical opinion and MU metric. Moreover, both MU and RUV revealed high AUC values in discretising between low and high uncertainty cases.SIGNIFICANCE: Our study is one of the first attempts to clinically validate uncertainty estimates in DL-based contouring. The two proposed metrics exhibited promising potential as indicators for clinicians to independently assess the quality of tumour delineation.
AB - OBJECTIVE: Prior to radiation therapy planning, accurate delineation of gross tumour volume (GTVs) and organs at risk (OARs) is crucial. In the current clinical practice, tumour delineation is performed manually by radiation oncologists, which is time-consuming and prone to large inter-observer variability. With the advent of deep learning (DL) models, automated contouring has become possible, speeding up procedures and assisting clinicians. However, these tools are currently used in the clinic mostly for contouring OARs, since these systems are not reliable yet for contouring GTVs. To improve the reliability of these systems, researchers have started exploring the topic of probabilistic neural networks. However, there is still limited knowledge of the practical implementation of such networks in real clinical settings.APPROACH: In this work, we developed a 3D probabilistic system that generates DL-based uncertainty maps for lung cancer CT segmentations. We employed the Monte Carlo (MC) dropout technique to generate probabilistic and uncertainty maps, while the model calibration was evaluated by using reliability diagrams. A clinical validation was conducted in collaboration with a radiation oncologist to qualitatively assess the value of the uncertainty estimates. We also proposed two novel metrics, namely mean uncertainty (MU) and relative uncertainty volume (RUV), as potential indicators for clinicians to assess the need for independent visual checks of the DL-based segmentation.MAIN RESULTS: Our study showed that uncertainty mapping effectively identified cases of under or over-contouring. Although the overconfidence of the model, a strong correlation was observed between the clinical opinion and MU metric. Moreover, both MU and RUV revealed high AUC values in discretising between low and high uncertainty cases.SIGNIFICANCE: Our study is one of the first attempts to clinically validate uncertainty estimates in DL-based contouring. The two proposed metrics exhibited promising potential as indicators for clinicians to independently assess the quality of tumour delineation.
KW - Monte Carlo dropout
KW - U-Net
KW - clinical validation
KW - contouring
KW - deep learning
KW - lung cancer
KW - uncertainty map
UR - http://www.scopus.com/inward/record.url?scp=85183328004&partnerID=8YFLogxK
U2 - 10.1088/1361-6560/ad1a26
DO - 10.1088/1361-6560/ad1a26
M3 - Article
C2 - 38171012
SN - 0031-9155
VL - 69
JO - Physics in medicine and biology
JF - Physics in medicine and biology
IS - 3
M1 - 035007
ER -