Selected Publication:
SHR
Neuro
Cancer
Cardio
Lipid
Metab
Microb
Abdulnazar, A; Roller, R; Schulz, S; Kreuzthaler, M.
Large Language Models for Clinical Text Cleansing Enhance Medical Concept Normalization
IEEE ACCESS. 2024; 12: 147981-147990.
Doi: 10.1109/ACCESS.2024.3472500
Web of Science
FullText
FullText_MUG
- Leading authors Med Uni Graz
-
Kreuzthaler Markus Eduard
-
Kuppassery Abdulnazar Akhila Naz
- Co-authors Med Uni Graz
-
Schulz Stefan
- Altmetrics:
- Dimensions Citations:
- Plum Analytics:
- Scite (citation analytics):
- Abstract:
- Most clinical information is only available as free text. Large language models (LLMs) are increasingly applied to clinical data to streamline communication, enhance the accuracy of clinical documentation, and ultimately improve healthcare delivery. This study focuses on a corpus of anonymized clinical narratives in German. On the one hand it evaluates the use of ChatGPT for text cleansing, i.e., the automatic rephrasing of raw text into a more readable and standardized form, and on the other hand for retrieval-augmented generation (RAG). In both tasks, the final goal was medical concept normalization (MCN), i.e., the annotation of text segments with codes from a controlled vocabulary using natural language processing. We found that ChatGPT (GPT-4) significantly improves precision and recall compared to simple dictionary matching. For all scenarios, the importance of the underlying terminological basis was also demonstrated. Maximum F1 scores of 0.607, 0.735 and 0.754 (i.e, for top 1, 5 and 10 matches) were achieved through a pipeline including document cleansing, bi-encoder-based term matching based on a large domain dictionary linked to SNOMED CT, and finally re-ranking using RAG.
- Find related publications in this database (Keywords)
-
Chatbots
-
Accuracy
-
Medical services
-
Large language models
-
Unified modeling language
-
Codes
-
Biological system modeling
-
Training
-
Data integrity
-
Text analysis
-
Natural language processing
-
Clinical diagnosis
-
ChatGPT
-
medical concept normalization
-
retrieval augmented generation
-
text cleansing