Selected Publication:
SHR
Neuro
Cancer
Cardio
Lipid
Metab
Microb
Kreuzthaler, M; Oleynik, M; Schulz, S.
Character-Level Neural Language Modelling in the Clinical Domain.
Stud Health Technol Inform. 2020; 270:83-87
Doi: 10.3233/SHTI200127
PubMed
FullText
FullText_MUG
- Leading authors Med Uni Graz
-
Kreuzthaler Markus Eduard
- Co-authors Med Uni Graz
-
Oleynik Michel
-
Schulz Stefan
- Altmetrics:
- Dimensions Citations:
- Plum Analytics:
- Scite (citation analytics):
- Abstract:
-
Word embeddings have become the predominant representation scheme on a token-level for various clinical natural language processing (NLP) tasks. More recently, character-level neural language models, exploiting recurrent neural networks, have again received attention, because they achieved similar performance against various NLP benchmarks. We investigated to what extent character-based language models can be applied to the clinical domain and whether they are able to capture reasonable lexical semantics using this maximally fine-grained representation scheme. We trained a long short-term memory network on an excerpt from a table of de-identified 50-character long problem list entries in German, each of which assigned to an ICD-10 code. We modelled the task as a time series of one-hot encoded single character inputs. After the training phase we accessed the top 10 most similar character-induced word embeddings related to a clinical concept via a nearest neighbour search and evaluated the expected interconnected semantics. Results showed that traceable semantics were captured on a syntactic level above single characters, addressing the idiosyncratic nature of clinical language. The results support recent work on general language modelling that raised the question whether token-based representation schemes are still necessary for specific NLP tasks.
- Find related publications in this database (using NLM MeSH Indexing)
-
Cluster Analysis -
-
Language -
-
Natural Language Processing -
-
Neural Networks, Computer -
-
Patient Care -
-
Semantics -