TY - JOUR
T1 - Improving Uncertainty-Error Correspondence in Deep Bayesian Medical Image Segmentation
AU - Mody, Prerak
AU - Chaves-de-Plaza, Nicolas F.
AU - Rao, Chinmay
AU - Astrenidou, Eleftheria
AU - de Ridder, Mischa
AU - Hoekstra, Nienke
AU - Hildebrandt, Klaus
AU - Staring, Marius
PY - 2024/8/31
Y1 - 2024/8/31
N2 - Increased usage of automated tools like deep learning in medical image segmentation has alleviated the bottleneck of manual contouring. This has shifted manual labour to quality assessment (QA) of automated contours which involves detecting errors and correcting them. A potential solution to semi-automated QA is to use deep Bayesian uncertainty to recommend potentially erroneous regions, thus reducing time spent on error detection. Previous work has investigated the correspondence between uncertainty and error, however, no work has been done on improving the “utility” of Bayesian uncertainty maps such that it is only present in inaccurate regions and not in the accurate ones. Our work trains the FlipOut model with the Accuracy-vs-Uncertainty (AvU) loss which promotes uncertainty to be present only in inaccurate regions. We apply this method on datasets of two radiotherapy body sites, c.f. head-and-neck CT and prostate MR scans. Uncertainty heatmaps (i.e. predictive entropy) are evaluated against voxel inaccuracies using Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves. Numerical results show that when compared to the Bayesian baseline the proposed method successfully suppresses uncertainty for accurate voxels, with similar presence of uncertainty for inaccurate voxels. Code to reproduce experiments is available at https://github.com/prerakmody/bayesuncertainty-error-correspondence
AB - Increased usage of automated tools like deep learning in medical image segmentation has alleviated the bottleneck of manual contouring. This has shifted manual labour to quality assessment (QA) of automated contours which involves detecting errors and correcting them. A potential solution to semi-automated QA is to use deep Bayesian uncertainty to recommend potentially erroneous regions, thus reducing time spent on error detection. Previous work has investigated the correspondence between uncertainty and error, however, no work has been done on improving the “utility” of Bayesian uncertainty maps such that it is only present in inaccurate regions and not in the accurate ones. Our work trains the FlipOut model with the Accuracy-vs-Uncertainty (AvU) loss which promotes uncertainty to be present only in inaccurate regions. We apply this method on datasets of two radiotherapy body sites, c.f. head-and-neck CT and prostate MR scans. Uncertainty heatmaps (i.e. predictive entropy) are evaluated against voxel inaccuracies using Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves. Numerical results show that when compared to the Bayesian baseline the proposed method successfully suppresses uncertainty for accurate voxels, with similar presence of uncertainty for inaccurate voxels. Code to reproduce experiments is available at https://github.com/prerakmody/bayesuncertainty-error-correspondence
KW - Bayesian Deep Learning
KW - Bayesian Uncertainty
KW - Uncertainty-Error Correspondence
KW - Uncertainty Calibration
KW - Contour Quality Assessment
KW - Model Calibration
U2 - 10.59275/j.melba.2024-5gc8
DO - 10.59275/j.melba.2024-5gc8
M3 - Article
SN - 2766-905X
VL - 2
SP - 1048
EP - 1082
JO - Journal of Machine Learning for Biomedical Imaging
JF - Journal of Machine Learning for Biomedical Imaging
M1 - 2024:018
ER -