Machine learning-enabled systematic review on coded healthcare data in heart failure research

  • Asgher Champsi
  • , Karin T Slater
  • , Simrat Gill
  • , Tomasz Dyszynski
  • , Megan Schröder
  • , Kiliana Suzart-Woischnik
  • , Benoit Tyl
  • , Guillaume Allée
  • , Alfonso Sartorius
  • , R Thomas Lumbers
  • , Folkert W Asselbergs
  • , Diederick E Grobbee
  • , Georgios Gkoutos
  • , Dipak Kotecha*
  • *Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

AIMS: Coded healthcare data are now commonly used in clinical research. This study aimed to assess the transparency of reporting within heart failure studies and employ machine learning to facilitate larger-scale evaluation.

METHODS & RESULTS: A systematic search of EMBASE and MEDLINE (2015-2020) identified 4279 heart failure studies with accessible Extensible Markup Language published in the top 25 journals by impact factor. Manual extraction in a random sample of 170 studies by independent human reviewers characterized 40 studies (23.5%) that used coded healthcare data, with 34 of these (85%) reporting doing so and only 19 (47.5%) providing clear descriptions of dataset construction and linkage. Another 420 studies underwent manual annotation to further train a Natural Language Processing (NLP) model designed for this study to automate and upscale review. The NLP model processed 3689 studies with a high level of internal accuracy (area under the receiver operating characteristic curve 0.97 and F1 score 0.96). Overall, the NLP approach identified 782 studies (21.2%) that reported coded healthcare data usage (95% CI 19.8-20.9%). No correlation was found between the reporting of coded healthcare data use and the publication year (r = -0.05; P = 0.21) or citation count (r = -0.13; P = 0.12).

CONCLUSION: One-fifth of contemporary heart failure research articles are already reporting the use of coded healthcare data, with at-scale evaluation facilitated by a machine-learning model. The limited transparency on how coded healthcare data were used in studies highlights the need for quality standards such as the CODE-EHR framework for the use of healthcare data in research.

Original languageEnglish
Article numberztaf123
JournalEuropean Heart Journal - Digital Health
Volume7
Issue number1
DOIs
Publication statusPublished - Jan 2026

Fingerprint

Dive into the research topics of 'Machine learning-enabled systematic review on coded healthcare data in heart failure research'. Together they form a unique fingerprint.

Cite this