TY - JOUR
T1 - Data mining information from electronic health records produced high yield and accuracy for current smoking status
AU - Groenhof, T Katrien J
AU - Koers, Laurien R
AU - Blasse, Enja
AU - de Groot, Mark
AU - Grobbee, Diederick E
AU - Bots, Michiel L
AU - Asselbergs, Folkert W
AU - Lely, A Titia
AU - Haitjema, Saskia
N1 - Funding Information:
Declaration of interest: The UCC is primarily financed by the UMC Utrecht. A grant from the Netherlands Organisation for Health Research and Development (#8480-34001) was obtained to develop feedback procedures. UCC website: www.umuctrecht.nl/ucc (in Dutch). Contact information of UCC: [email protected].
Publisher Copyright:
© 2019 The Authors
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2020/2/1
Y1 - 2020/2/1
N2 - Objectives: Researchers are increasingly using routine clinical data for care evaluations and feedback to patients and clinicians. The quality of these evaluations depends on the quality and completeness of the input data. Study Design and Setting: We assessed the performance of an electronic health record (EHR)-based data mining algorithm, using the example of the smoking status in a cardiovascular population. As a reference standard, we used the questionnaire from the Utrecht Cardiovascular Cohort (UCC). To assess diagnostic accuracy, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). Results: We analyzed 1,661 patients included in the UCC to January 18, 2019. Of those, 14% (n = 238) had missing information on smoking status in the UCC questionnaire. Data mining provided information on smoking status in 99% of the 1,661 participants. Diagnostic accuracy for current smoking was sensitivity 88%, specificity 92%, NPV 98%, and PPV 63%. From false positives, 85% reported they had quit smoking at the time of the UCC. Conclusion: Data mining showed great potential in retrieving information on smoking (a near complete yield). Its diagnostic performance is good for negative smoking statuses. The implications of misclassification with data mining are dependent on the application of the data.
AB - Objectives: Researchers are increasingly using routine clinical data for care evaluations and feedback to patients and clinicians. The quality of these evaluations depends on the quality and completeness of the input data. Study Design and Setting: We assessed the performance of an electronic health record (EHR)-based data mining algorithm, using the example of the smoking status in a cardiovascular population. As a reference standard, we used the questionnaire from the Utrecht Cardiovascular Cohort (UCC). To assess diagnostic accuracy, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). Results: We analyzed 1,661 patients included in the UCC to January 18, 2019. Of those, 14% (n = 238) had missing information on smoking status in the UCC questionnaire. Data mining provided information on smoking status in 99% of the 1,661 participants. Diagnostic accuracy for current smoking was sensitivity 88%, specificity 92%, NPV 98%, and PPV 63%. From false positives, 85% reported they had quit smoking at the time of the UCC. Conclusion: Data mining showed great potential in retrieving information on smoking (a near complete yield). Its diagnostic performance is good for negative smoking statuses. The implications of misclassification with data mining are dependent on the application of the data.
KW - Data mining
KW - Data quality
KW - Electronic health records
KW - Learning healthcare system
KW - Routine clinical data
KW - Text mining
UR - https://www.scopus.com/pages/publications/85076834377
U2 - 10.1016/j.jclinepi.2019.11.006
DO - 10.1016/j.jclinepi.2019.11.006
M3 - Article
C2 - 31730918
SN - 0895-4356
VL - 118
SP - 100
EP - 106
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
ER -