Medizinische Universität Graz - Research portal

Logo MUG Resarch Portal

Selected Publication:

SHR Neuro Cancer Cardio Lipid Metab Microb

Oleynik, M; Nohama, P; Cancian, PS; Schulz, S.
Performance analysis of a POS tagger applied to discharge summaries in Portuguese.
Stud Health Technol Inform. 2010; 160(Pt 2):959-963
PubMed FullText

 

Leading authors Med Uni Graz
Oleynik Michel
Co-authors Med Uni Graz
Schulz Stefan
Altmetrics:

Dimensions Citations:

Plum Analytics:
Abstract:
Part of speech taggers need a considerable amount of data to train their models. Such data is not readily available for medical texts in Portuguese. We evaluated the accuracy of a morphological tagger against a gold standard when trained with corpora of different sizes and domains. Accuracy was the highest with a medical corpus during the complete training process, achieving 91.5%. Training on a newswire corpus achieved 75.3% only. Furthermore, an active learning technique has been adapted to the POS tagging task. The algorithm uses a POS tagger committee to isolate the sentences with the highest disagreement indexes for manual correction. However, the method was not able to reduce training and tagging times when compared to a random selection strategy. We encourage that future works employ some effort in order to annotate a small amount of random data in the domain of study, which should be enough for higher accuracy rates.
Find related publications in this database (using NLM MeSH Indexing)
Algorithms -
Information Storage and Retrieval - methods
Natural Language Processing -
Natural Language Processing -
Semantics -
Terminology as Topic -

© Med Uni GrazImprint