Medizinische Universität Graz - Research portal

Logo MUG Resarch Portal

Selected Publication:

SHR Neuro Cancer Cardio Lipid Metab Microb

Kreuzthaler, M; Schulz, S.
Truecasing clinical narratives.
Stud Health Technol Inform. 2011; 169: 589-593.
PubMed

 

Leading authors Med Uni Graz
Kreuzthaler Markus Eduard
Schulz Stefan
Altmetrics:

Dimensions Citations:

Plum Analytics:
Abstract:
Truecasing, or capitalization, is the rewriting of each word of an input text with its proper case information. Many medical texts, especially those from legacy systems, are still written entirely in capitalized letters, hampering their readability. We present a pilot study that uses the World Wide Web as a corpus in order to support automatic truecasing. The texts under scrutiny were German-language pathology reports. By submitting token bigrams to the Google Web search engine we collected enough case information so that we achieved 81.3% accuracy for acronyms and 98.5% accuracy for normal words. This is all the more impressive as only half of the words used in this corpus existed in a standard medical dictionary due to the excessive use of ad-hoc single-word nominal compounds in German. Our system performed less satisfactory for spelling correction, and in three cases the proposed word substitutions altered the meaning of the input sentence. For the routine deployment of this method the dependency on a (black box) search engine must be overcome, for example by using cloud-based Web n-gram services.
Find related publications in this database (using NLM MeSH Indexing)
Algorithms -
Documentation - methods
Documentation -
Humans -
Information Storage and Retrieval - methods
Internet -
Language -
Medical Records -
Pattern Recognition, Automated - methods
Pattern Recognition, Visual - physiology
Reading -
Reproducibility of Results -
Semantics -

© Med Uni GrazImprint