Selected Publication:
SHR
Neuro
Cancer
Cardio
Lipid
Metab
Microb
Abdulnazar, A; Roller, R; Schulz, S; Kreuzthaler, M.
Unsupervised SapBERT-based bi-encoders for medical concept annotation of clinical narratives with SNOMED CT
DIGIT HEALTH. 2024; 10: 20552076241288681
Doi: 10.1177/20552076241288681
[OPEN ACCESS]
Web of Science
PubMed
FullText
FullText_MUG
- Leading authors Med Uni Graz
-
Kreuzthaler Markus Eduard
-
Kuppassery Abdulnazar Akhila Naz
- Co-authors Med Uni Graz
-
Schulz Stefan
- Altmetrics:
- Dimensions Citations:
- Plum Analytics:
- Scite (citation analytics):
- Abstract:
- OBJECTIVE: Clinical narratives provide comprehensive patient information. Achieving interoperability involves mapping relevant details to standardized medical vocabularies. Typically, natural language processing divides this task into named entity recognition (NER) and medical concept normalization (MCN). State-of-the-art results require supervised setups with abundant training data. However, the limited availability of annotated data due to sensitivity and time constraints poses challenges. This study addressed the need for unsupervised medical concept annotation (MCA) to overcome these limitations and support the creation of annotated datasets. METHOD: We use an unsupervised SapBERT-based bi-encoder model to analyze n-grams from narrative text and measure their similarity to SNOMED CT concepts. At the end, we apply a syntactical re-ranker. For evaluation, we use the semantic tags of SNOMED CT candidates to assess the NER phase and their concept IDs to assess the MCN phase. The approach is evaluated with both English and German narratives. RESULT: Without training data, our unsupervised approach achieves an F1 score of 0.765 in English and 0.557 in German for MCN. Evaluation at the semantic tag level reveals that "disorder" has the highest F1 scores, 0.871 and 0.648 on English and German datasets. Furthermore, the MCA approach on the semantic tag "disorder" shows F1 scores of 0.839 and 0.696 in English and 0.685 and 0.437 in German for NER and MCN, respectively. CONCLUSION: This unsupervised approach demonstrates potential for initial annotation (pre-labeling) in manual annotation tasks. While promising for certain semantic tags, challenges remain, including false positives, contextual errors, and variability of clinical language, requiring further fine-tuning.
- Find related publications in this database (Keywords)
-
Named entity recognition
-
medical concept normalization
-
SNOMED CT
-
natural language processing
-
interoperability