TY - JOUR
T1 - DEDUCE
T2 - A pattern matching method for automatic de-identification of Dutch medical text
AU - Menger, Vincent
AU - Scheepers, Floor
AU - van Wijk, Lisette Maria
AU - Spruit, Marco R
PY - 2018/7/1
Y1 - 2018/7/1
N2 - In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons. We report on a pattern matching method to automatically de-identify medical text written in Dutch, which requires a low amount of effort to be hand tailored. First, a selection of Protected Health Information (PHI) categories is determined in cooperation with medical staff. Then, we devise a method for de-identifying all information in one of these PHI categories, that relies on lookup tables, decision rules and fuzzy string matching. Our de-identification method DEDUCE is validated on a test corpus of 200 nursing notes and 200 treatment plans obtained from the University Medical Center Utrecht (UMCU) in the Netherlands, achieving a total micro-averaged precision of 0.814, a recall of 0.916 and a F1-score of 0.862. For person names, a recall of 0.964 was achieved, while no names of patients were missed.
AB - In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons. We report on a pattern matching method to automatically de-identify medical text written in Dutch, which requires a low amount of effort to be hand tailored. First, a selection of Protected Health Information (PHI) categories is determined in cooperation with medical staff. Then, we devise a method for de-identifying all information in one of these PHI categories, that relies on lookup tables, decision rules and fuzzy string matching. Our de-identification method DEDUCE is validated on a test corpus of 200 nursing notes and 200 treatment plans obtained from the University Medical Center Utrecht (UMCU) in the Netherlands, achieving a total micro-averaged precision of 0.814, a recall of 0.916 and a F1-score of 0.862. For person names, a recall of 0.964 was achieved, while no names of patients were missed.
KW - De-identification
KW - Dutch medical text
KW - Patient privacy
KW - Pattern matching
KW - Protected Health Information
UR - http://www.scopus.com/inward/record.url?scp=85027576173&partnerID=8YFLogxK
U2 - 10.1016/j.tele.2017.08.002
DO - 10.1016/j.tele.2017.08.002
M3 - Article
AN - SCOPUS:85027576173
SN - 0736-5853
VL - 35
SP - 727
EP - 736
JO - Telematics and Informatics
JF - Telematics and Informatics
IS - 4
ER -