TY - JOUR
T1 - CodeMapper
T2 - semiautomatic coding of case definitions. A contribution from the ADVANCE project
AU - Becker, Benedikt F.H.
AU - Avillach, Paul
AU - Romio, Silvana
AU - van Mulligen, Erik M.
AU - Weibel, Daniel
AU - Sturkenboom, Miriam C.J.M.
AU - Kors, Jan A
N1 - Funding Information:
We would like to thank all investigators of the SAFEGUARD consortium for the code sets. The CodeMapper application was developed in the ADVANCE project, which received support from the Innovative Medicines Initiative Joint Undertaking under ADVANCE grant agreement no. 115557, with financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies in kind contribution.
Publisher Copyright:
© 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Background: Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process. Methods: CodeMapper builds upon coding vocabularies contained in the Metathesaurus of the Unified Medical Language System. The mapping approach consists of three phases. First, medical concepts are automatically identified in a free-text case definition. Second, the user revises the set of medical concepts by adding or removing concepts, or expanding them to related concepts that are more general or more specific. Finally, the selected concepts are projected to codes from the targeted coding vocabularies. We evaluated the application by comparing codes that were automatically generated from case definitions by applying CodeMapper's concept identification and successive concept expansion, with reference codes that were manually created in a previous epidemiological study. Results: Automated concept identification alone had a sensitivity of 0.246 and positive predictive value (PPV) of 0.420 for reproducing the reference codes. Three successive steps of concept expansion increased sensitivity to 0.953 and PPV to 0.616. Conclusions: Automatic concept identification in the case definition alone was insufficient to reproduce the reference codes, but CodeMapper's operations for concept expansion provide an effective, efficient, and transparent way for reproducing the reference codes.
AB - Background: Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process. Methods: CodeMapper builds upon coding vocabularies contained in the Metathesaurus of the Unified Medical Language System. The mapping approach consists of three phases. First, medical concepts are automatically identified in a free-text case definition. Second, the user revises the set of medical concepts by adding or removing concepts, or expanding them to related concepts that are more general or more specific. Finally, the selected concepts are projected to codes from the targeted coding vocabularies. We evaluated the application by comparing codes that were automatically generated from case definitions by applying CodeMapper's concept identification and successive concept expansion, with reference codes that were manually created in a previous epidemiological study. Results: Automated concept identification alone had a sensitivity of 0.246 and positive predictive value (PPV) of 0.420 for reproducing the reference codes. Three successive steps of concept expansion increased sensitivity to 0.953 and PPV to 0.616. Conclusions: Automatic concept identification in the case definition alone was insufficient to reproduce the reference codes, but CodeMapper's operations for concept expansion provide an effective, efficient, and transparent way for reproducing the reference codes.
KW - concept identification
KW - database extraction
KW - multiple medical vocabularies
KW - semantic operations
KW - UMLS
UR - http://www.scopus.com/inward/record.url?scp=85021452458&partnerID=8YFLogxK
U2 - 10.1002/pds.4245
DO - 10.1002/pds.4245
M3 - Article
AN - SCOPUS:85021452458
SN - 1053-8569
VL - 26
SP - 998
EP - 1005
JO - Pharmacoepidemiology and Drug Safety
JF - Pharmacoepidemiology and Drug Safety
IS - 8
ER -