Gewählte Publikation:
SHR
Neuro
Krebs
Kardio
Lipid
Stoffw
Microb
Kugic, A; Schulz, S; Kreuzthaler, M.
Term Candidate Generation to Enrich Clinical Terminologies with Large Language Models.
Stud Health Technol Inform. 2024; 316: 695-699.
Doi: 10.3233/SHTI240509
PubMed
FullText
FullText_MUG
- Führende Autor*innen der Med Uni Graz
-
Kugic Amila
- Co-Autor*innen der Med Uni Graz
-
Kreuzthaler Markus Eduard
-
Schulz Stefan
- Altmetrics:
- Dimensions Citations:
- Plum Analytics:
- Scite (citation analytics):
- Abstract:
- Annotated language resources derived from clinical routine documentation form an intriguing asset for secondary use case scenarios. In this investigation, we report on how such a resource can be leveraged to identify additional term candidates for a chosen set of ICD-10 codes. We conducted a log-likelihood analysis, considering the co-occurrence of approximately 1.9 million de-identified ICD-10 codes alongside corresponding brief textual entries from problem lists in German. This analysis aimed to identify potential candidates with statistical significance set at p < 0.01, which were used as seed terms to harvest additional candidates by interfacing to a large language model in a second step. The proposed approach can identify additional term candidates at suitable performance values: hypernyms MAP@5=0.801, synonyms MAP@5 = 0.723 and hyponyms MAP@5 = 0.507. The re-use of existing annotated clinical datasets, in combination with large language models, presents an interesting strategy to bridge the lexical gap in standardized clinical terminologies and real-world jargon.
- Find related publications in this database (using NLM MeSH Indexing)
-
International Classification of Diseases - administration & dosage
-
Vocabulary, Controlled - administration & dosage
-
Natural Language Processing - administration & dosage
-
Humans - administration & dosage
-
Terminology as Topic - administration & dosage
-
Electronic Health Records - classification
-
Germany - administration & dosage