Kreuzthaler, M; Pfeifer, B; Schulz, S.
Secondary Use of Clinical Problem List Descriptions for Bi-Encoder Based ICD-10 Classification.
AMIA Annu Symp Proc. 2024; 2024:620-627
[OPEN ACCESS] PubMed
Annotated language resources are essential for supervised machine learning methods. In the clinical domain, such data sets can boost use-case specific natural language processing services. In this work, we have analyzed a clinical problem list table consisting of millions of ICD-10 codes assigned to short problem list descriptions in German. We have investigated whether the given data forms a valuable resource within a secondary use case scenario for coding support. Our proposed methodology exploits an embedding-based k-NN classifier, which was evaluated based on its coding performance, leveraging the multilingual BERT based language model SapBERT-UMLS in comparison with medBERT.de, which is specifically tailored to medical and clinical language resources in German. Our approach reached a weighted F1-measure of 0.87 using SapBERT-UMLS and an F1-measure of 0.86 for medBERT.de. The approach revealed promising coding results when reusing annotated language resources out of clinical routine documentation.
Find related publications in this database (using NLM MeSH Indexing)