TY - GEN
T1 - Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records
AU - Rijcken, Emil
AU - Zervanou, Kalliopi
AU - Spruit, Marco
AU - Mosteiro, Pablo
AU - Scheepers, Floortje
AU - Kaymak, Uzay
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The written notes in the Electronic Health Records contain a vast amount of information about patients. Implementing automated approaches for text classification tasks requires the automated methods to be well-interpretable, and topic models can be used for this goal as they can indicate what topics in a text are relevant to making a decision. We propose a new topic modeling algorithm, FLSA-E, and compare it with another state-of-the-art algorithm FLSA-W. In FLSA-E, topics are found by fuzzy clustering in a word embedding space. Since we use word embeddings as the basis for our clustering, we extend our evaluation with word-embeddings-based evaluation metrics. We find that different evaluation metrics favour different algorithms. Based on the results, there is evidence that FLSA-E has fewer outliers in its topics, a desirable property, given that within-topic words need to be semantically related.
AB - The written notes in the Electronic Health Records contain a vast amount of information about patients. Implementing automated approaches for text classification tasks requires the automated methods to be well-interpretable, and topic models can be used for this goal as they can indicate what topics in a text are relevant to making a decision. We propose a new topic modeling algorithm, FLSA-E, and compare it with another state-of-the-art algorithm FLSA-W. In FLSA-E, topics are found by fuzzy clustering in a word embedding space. Since we use word embeddings as the basis for our clustering, we extend our evaluation with word-embeddings-based evaluation metrics. We find that different evaluation metrics favour different algorithms. Based on the results, there is evidence that FLSA-E has fewer outliers in its topics, a desirable property, given that within-topic words need to be semantically related.
KW - Electronic Health Records
KW - Fuzzy Clustering
KW - Fuzzy Methods
KW - Natural Language Processing
KW - Neural Network methods
KW - Psychiatry
KW - Topic Modeling
KW - Word Embeddings
UR - http://www.scopus.com/inward/record.url?scp=85142699260&partnerID=8YFLogxK
U2 - 10.1109/SMC53654.2022.9945594
DO - 10.1109/SMC53654.2022.9945594
M3 - Conference contribution
AN - SCOPUS:85142699260
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 2669
EP - 2674
BT - 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022
Y2 - 9 October 2022 through 12 October 2022
ER -