An Order-Sensitive Hierarchical Neural Model for Early Lung Cancer Detection Using Dutch Primary Care Notes and Structured Data

Iacopo Vagliano*, Miguel Rios, Mohanad Abukmeil, Martijn C. Schut, Torec T. Luik, Kristel M. van Asselt, Henk C.P.M. van Weert, Ameen Abu-Hanna

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

2 Downloads (Pure)

Abstract

Background: Improving prediction models to timely detect lung cancer is paramount. Our aim is to develop and validate prediction models for early detection of lung cancer in primary care, based on free-text consultation notes, that exploit the order and context among words and sentences. Methods: Data of all patients enlisted in 49 general practices between 2002 and 2021 were assessed, and we included those older than 30 years with at least one free-text note. We developed two models using a hierarchical architecture that relies on attention and bidirectional long short-term memory networks. One model used only text, while the other combined text with clinical variables. The models were trained on data excluding the five months leading up to the diagnosis, using target replication and a tuning set, and were tested on a separate dataset for discrimination, PPV, and calibration. Results: A total of 250,021 patients were enlisted, with 1507 having a lung cancer diagnosis. Included in the analysis were 183,012 patients, of which 712 had the diagnosis. From the two models, the combined model showed slightly better performance, achieving an AUROC on the test set of 0.91, an AUPRC of 0.05, and a PPV of 0.034 (0.024, 0.043), and showed good calibration. To early detect one cancer patient, 29 high-risk patients would require additional diagnostic testing. Conclusions: Our models showed excellent discrimination by leveraging the word and sentence structure. Including clinical variables in addition to text slightly improved performance. The number needed to treat holds promise for clinical practice. Investigating external validation and model suitability in clinical practice is warranted.

Original languageEnglish
Article number1151
Number of pages17
JournalCancers
Volume17
Issue number7
DOIs
Publication statusPublished - Apr 2025

Keywords

  • early detection
  • hierarchical attention network
  • lung cancer
  • machine learning
  • natural language processing
  • prediction models
  • primary care
  • word embeddings

Fingerprint

Dive into the research topics of 'An Order-Sensitive Hierarchical Neural Model for Early Lung Cancer Detection Using Dutch Primary Care Notes and Structured Data'. Together they form a unique fingerprint.

Cite this