TY - JOUR
T1 - Text-mining in electronic healthcare records can be used as efficient tool for screening and data-collection in cardiovascular trials
T2 - a multicenter validation study
AU - van Dijk, Wouter B
AU - Fiolet, Aernoud T L
AU - Schuit, Ewoud
AU - Sammani, Arjan
AU - Groenhof, T Katrien J
AU - van der Graaf, Rieke
AU - de Vries, Martine C
AU - Alings, Marco
AU - Schaap, Jeroen
AU - Asselbergs, Folkert W
AU - Grobbee, Diederick E
AU - Groenwold, Rolf H H
AU - Mosterd, Arend
N1 - Funding Information:
Funding: This work was supported by the Netherlands Organisation for Health Research and Development (ZonMW) (grant number 91217027 ). A. Sammani was funded by the University Medical Center Utrecht Alexandre Suerman Stipendium. Folkert Asselbergs was supported by UCL Hospitals NIHR Biomedical Research.
Publisher Copyright:
© 2020 The Authors
PY - 2021/4
Y1 - 2021/4
N2 - Objective: This study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial. Study Design and Setting: In three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel. Results: Of the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants). In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4–8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7–92.8%). Conclusion: Automatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline information.
AB - Objective: This study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial. Study Design and Setting: In three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel. Results: Of the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants). In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4–8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7–92.8%). Conclusion: Automatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline information.
KW - Cardiovascular
KW - Data-collections
KW - Data-mining
KW - Electronic healthcare records (EHRs)
KW - Electronic medical records (EMRs)
KW - LoDoCo2
KW - Multicenter
KW - Recruitment
KW - Screening
KW - Text-mining
KW - Trials
UR - http://www.scopus.com/inward/record.url?scp=85099209663&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2020.11.014
DO - 10.1016/j.jclinepi.2020.11.014
M3 - Article
C2 - 33248277
SN - 0895-4356
VL - 132
SP - 97
EP - 105
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
ER -