Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy

Research output: Contribution to journalArticleAcademicpeer-review

1 Downloads (Pure)

Abstract

BACKGROUND: Deep learning-based anatomy segmentation holds promise for improving real-time guidance in complex surgeries such as robot-assisted minimally invasive esophagectomy (RAMIE). However, the clinical relevance of commonly used metrics for evaluating segmentation quality remains unclear, as previous assessments have lacked direct input from surgeons. This study aims to assess how well quantitative segmentation metrics reflect surgeons' assessments of anatomical overlay accuracy and clinical usefulness during RAMIE.

METHODS: We conducted a survey involving 26 upper gastrointestinal surgeons, including both trainee and attending surgeons, who assessed video clips of RAMIE procedures featuring deep learning-generated anatomical overlays. We correlated the surgeons' qualitative evaluations of annotation accuracy and clinical usefulness with a comprehensive set of quantitative metrics, including overlap, distance, temporal, and error-specific measures. The analysis encompassed over 8000 manually annotated frames from 12 video clips, with overlays generated by two state-of-the-art deep learning models.

RESULTS: Overlap and temporal consistency metrics show the strongest correlation with surgeon assessments. Distance-based and error-specific metrics correlate moderately. Novices show weaker correlations and tend to rate overlays more leniently. Qualitative feedback reveals issues like hallucinations and instability, often missed by current metrics.

CONCLUSION: Standard quantitative metrics partially reflect surgeon perceptions but should be complemented by surgeon-informed evaluations and task-specific metrics to better capture clinically relevant errors. Aligning metric design with surgical expertise is essential for the safe and effective integration of AI-guided anatomical segmentation in the operating room.

Original languageEnglish
Pages (from-to)277-290
Number of pages14
JournalSurgical endoscopy
Volume40
Issue number1
Early online date10 Oct 2025
DOIs
Publication statusPublished - Jan 2026

Keywords

  • Anatomy recognition
  • Deep learning
  • Evaluation metrics
  • Robot-assisted surgery
  • Semantic segmentation
  • Survey

Fingerprint

Dive into the research topics of 'Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy'. Together they form a unique fingerprint.

Cite this