Dynamic Topic Modeling Techniques for Evolving Medical Texts with UMLS Concepts

  • Jayabharathi S Department of Computer Science, Vellalar College for Women (Autonomous), Thindal, Erode, Tamil Nadu, India
  • Logambal M Department of Computer Science, Vellalar College for Women (Autonomous), Thindal, Erode, Tamil Nadu, India
Keywords: Topic Modeling, UMLS Integration, Medical Text Analysis, BERTopic

Abstract

In the realm of biomedical research, efficiently extracting and categorising subjects from enormous amounts of medical texts is crucial for knowledge discovery and information retrieval. Traditional topic modelling approaches are useful, but they usually fall short in capturing the intricate semantics of medical terminology. This study investigates the potential benefits of using Unified Medical Language System (UMLS) principles to topic modelling using the MedMentions dataset. We employ four techniques: BERTopic, Latent Dirichlet Allocation (LDA), Hybrid LDA and RNN, and a novel Hybrid BERTopic with Recurrent Neural Networks (RNN). By incorporating UMLS concepts into these models, we hope to improve subject coherence and relevance. According to our research, in terms of clinical relevance and topic coherence.

Metrics

Metrics Loading ...

References

Srivastava, C. Sutton, (2017) Autoencoding Variational Inference for Topic Models. arXiv preprint arXiv . https://doi.org/10.48550/arXiv.1703.01488

J. Qiang, Z. Qian, Y. Li, Y. Yuan, X. Wu, Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering, 34(3), (2020) 1427-1445. https://doi.org/10.1109/TKDE.2020.2992485

T.L. Griffiths, M.Steyvers, Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), (2004) 5228-5235. https://doi.org/10.1073/pnas.0307752101

Grootendorst, M. (2020). BERTopic: Leveraging BERT and c-TF-IDF for Topic Modeling. “arXiv preprint arXiv:2010.06159”. Available: https://arxiv.org/abs/2010.06159

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol. 9 (8), (1997) 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

D.M. Blei, A.Y. Ng, M.I. Jordan, Latent dirichlet allocation. Journal of machine Learning research, 3, (2003) 993-1022.

D. Demner-Fushman, K.W. Fung, P. Do, R.D. Boyce, T.R. Goodwin, (2018). Overview of the TAC 2018 Drug-Drug Interaction Extraction from Drug Labels Track. Theory and Applications of Categories. Available at: https://tac.nist.gov/publications/2018/additional.papers/TAC2018.DDI.overview.proceedings.pdf

Chen, L., Xing, Q., & Chen, S. (2017). A topic-based latent Dirichlet allocation model for short text classification in social networks. PloS one, 12(11), e0189142.

W. Zhao, J.J. Chen, R. Perkins, Z. Liu, W. Ge, Y. Ding, W. Zou, A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics, 16(Suppl 13), (2015). https://doi.org/10.1186/1471-2105-16-S13-S8

O. Bodenreider, The Unified Medical Language System (UMLS): integrating biomedical terminology, Nucleic Acids Research, 32, (suppl_1), D267–D270. https://doi.org/10.1093/nar/gkh061

McCray, A. T., Burgun, A., & Bodenreider, O. (2001). Aggregating UMLS semantic types for reducing conceptual complexity. In MEDINFO 2001 (pp. 216-220). IOS Press. https://doi.org/10.3233/978-1-60750-928-8-216

Kulkarni, S., Singh, A., & Ramakrishnan, G. (2020). BERTopic: Leveraging BERT for Topic Modeling. arXiv preprint arXiv:2008.10306.

Xiao, C., Chattopadhyay, S., Sun, J., & Fan, J. (2021). DeepSemantic: Neural Network Based Topic Modeling. Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), 768-776.

He, Y., Chen, Z., Li, L., Zhang, Y., Li, S., & Xue, Y. (2021). Application of BERTopic in analyzing social media data during COVID-19 pandemic. Journal of Medical Internet Research, 23(11), e30292.

Dieng, A. B., Ruiz, F. J., Blei, D. M., & Miller, T. (2019). Topic Modeling in Embedding Spaces. In Proceedings of the 36th International Conference on Machine Learning (ICML).

Liu, F., Yu, H., & Zhou, Y. (2016). Enhanced Medical Named Entity Recognition with UMLS Concept Mapping. Journal of Biomedical Informatics, 60, 334-341.

Cohen, T., Roberts, K., Gururangan, S., & Jones, L. (2018). MedMentions: A large biomedical corpus annotated with UMLS concepts. Bioinformatics, 34(22), 3973-3981.

Published
2024-12-28
How to Cite
S, J., & M, L. (2024). Dynamic Topic Modeling Techniques for Evolving Medical Texts with UMLS Concepts. International Journal of Computer Communication and Informatics, 6(2), 61-68. https://doi.org/10.34256/ijcci2425



Views: Abstract : 12 | PDF : 31

Plum Analytics