Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy

Ronald L.P.D. de Jong, Yasmina al Khalil, Tim J.M. Jaspers, Romy C. van Jaarsveld, Gino M. Kuiper, Yiping Li, Richard van Hillegersberg, Jelle P. Ruurda, Marcel Breeuwer, Fons van der Sommen

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Esophageal cancer is among the most common types of cancer worldwide. It is traditionally treated using open esophagectomy, but in recent years, robot-assisted minimally invasive esophagectomy (RAMIE) has emerged as a promising alternative. However, robot-assisted surgery can be challenging for novice surgeons, as they often suffer from a loss of spatial orientation. Computer-aided anatomy recognition holds promise for improving surgical navigation, but research in this area remains limited. In this study, we developed a comprehensive dataset for semantic segmentation in RAMIE, featuring the largest collection of vital anatomical structures and surgical instruments to date. Handling this diverse set of classes presents challenges, including class imbalance and the recognition of complex structures such as nerves. This study aims to understand the challenges and limitations of current state-of-the-art algorithms on this novel dataset and problem. Therefore, we benchmarked eight real-time deep learning models using two pretraining datasets. We assessed both traditional and attention-based networks, hypothesizing that attention-based networks better capture global patterns and address challenges such as occlusion caused by blood or other tissues. The benchmark includes our RAMIE dataset and the publicly available CholecSeg8k dataset, enabling a thorough assessment of surgical segmentation tasks. Our findings indicate that pretraining on ADE20k, a dataset for semantic segmentation, is more effective than pretraining on ImageNet. Furthermore, attention-based models outperform traditional convolutional neural networks, with SegNeXt and Mask2Former achieving higher Dice scores, and Mask2Former additionally excelling in average symmetric surface distance.

Original languageEnglish
Title of host publicationMedical Imaging 2025
Subtitle of host publicationImage-Guided Procedures, Robotic Interventions, and Modeling
EditorsMaryam E. Rettmann, Jeffrey H. Siewerdsen
PublisherSPIE
ISBN (Electronic)9781510685949
DOIs
Publication statusPublished - 2025
EventMedical Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling - San Diego, United States
Duration: 17 Feb 202520 Feb 2025

Publication series

NameProgress in Biomedical Optics and Imaging - Proceedings of SPIE
Volume13408
ISSN (Print)1605-7422

Conference

ConferenceMedical Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling
Country/TerritoryUnited States
CitySan Diego
Period17/02/2520/02/25

Keywords

  • Anatomy recognition
  • cholecystectomy
  • computer vision
  • deep learning
  • esophagectomy
  • robotics
  • semantic segmentation
  • surgery

Fingerprint

Dive into the research topics of 'Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy'. Together they form a unique fingerprint.

Cite this