Leveraging heterogeneity of European healthcare data sources to estimate validity of case-finding algorithms in multi-database studies where a true gold standard is lacking: Strategy from the Emif project

Giuseppe Roberto, Maria Garcia-Gil, Talita Duarte-Salles, Paul Avillach, Elisabeth Smits, Sulev Reisberg, Alessandro Pasqua, Lars Pedersen, Lara Tramontan, Miguel A. Mayer, Ron Herings, Miriam Sturkenboom, Peter Rijnbeek, Rosa Gini

Research output: Contribution to journalMeeting AbstractAcademic

Abstract

Background: European healthcare databases are heterogeneous in coding terminology, language, underlying health policies and data provenance. Some collect diagnoses from primary care practices (PC), others from inpatient care (INP) and/or death registries (DEATH). In multi‐database studies, heterogeneity is commonly considered as a weakness.

Objectives: Leveraging heterogeneity of healthcare databases to estimate case‐finding algorithms' sensitivity (SE) and positive predictive values (PPV) when a gold standard is not available.

Methods: We measured the incidence of acute myocardial infarction (AMI) as a test case. Five databases were considered: SIDIAP (Spain), HSD, ARS (Italy), PHARMO (Netherlands) and AUH (Denmark). HSD provided diagnoses from PC, SIDIAP and PHARMO from PC and INP, ARS and AUH from INP and DEATH. The Unified Medical Language System was used to project the AMI concept to local terminologies (ICD9CM, ICD10, ICPC, READ). Three standardized AMI‐finding algorithms, PC, INP and DEATH, were created. In each database, cases were retrieved using all available algorithms. Cumulative incidence (CI) of AMI was estimated in 2012 among subjects aged 45+, with ≥2 years of look‐back. Results were compared within and across databases. Based on previous validation studies: PPV of INP was 100% in AUH, and PPV of PC in HSD was 96.6%. To estimate the algorithms' SE and PPV in all databases, we made three assumptions: PPV = 100% for INP, SE = 100% for the combination INP or DEATH, cases retrieved from PC were true positives only if also found in INP.

Results: Study population was about 4 million subjects. CI (cases/10,000 persons) ranged between 7.8 and 24.8 for PC, 30.3 and 47.3 for INP and 8.9 and 9.4 for DEATH. Cases identified from two provenances overlapped partially. INP identified 28.9% and 30.5% of DEATH in ARS and AUH, and 44.7% and 44.1% of PC in SIDIAP and PHARMO, respectively. Based on our assumptions, conservative estimates of SE of INP were 83.4% in ARS and 77.5% in AUH; PPV of PC was 44.1% in PHARMO and 44.7% in SIDIAP. Assuming that SE of INP in SIDIAP and PHARMO was the average between ARS' and AUH's (80.5%), then conservative estimates of SE of PC in PHARMO and SIDIAP were 26.7% and 29.4%, respectively. The average between those two estimates, 28.1%, could be assumed to be SE of PC in HSD.

Conclusions: In multi‐database studies, when de novo validation is not possible, existing information and assumptions can be exploited to provide a range of validity estimates and adjust study results to account for event misclassification.
Original languageEnglish
Pages (from-to)109-109
JournalPharmacoepidemiology and Drug Safety
Volume28
Issue numberS2
Publication statusPublished - Aug 2019

Fingerprint

Dive into the research topics of 'Leveraging heterogeneity of European healthcare data sources to estimate validity of case-finding algorithms in multi-database studies where a true gold standard is lacking: Strategy from the Emif project'. Together they form a unique fingerprint.

Cite this