Selected Publication:
SHR
Neuro
Cancer
Cardio
Lipid
Metab
Microb
Oleynik, M; Nohama, P; Cancian, PS; Schulz, S.
Performance analysis of a POS tagger applied to discharge summaries in Portuguese.
Stud Health Technol Inform. 2010; 160(Pt 2):959-963
PubMed
FullText
- Leading authors Med Uni Graz
-
Oleynik Michel
- Co-authors Med Uni Graz
-
Schulz Stefan
- Altmetrics:
- Dimensions Citations:
- Plum Analytics:
- Abstract:
-
Part of speech taggers need a considerable amount of data to train their models. Such data is not readily available for medical texts in Portuguese. We evaluated the accuracy of a morphological tagger against a gold standard when trained with corpora of different sizes and domains. Accuracy was the highest with a medical corpus during the complete training process, achieving 91.5%. Training on a newswire corpus achieved 75.3% only. Furthermore, an active learning technique has been adapted to the POS tagging task. The algorithm uses a POS tagger committee to isolate the sentences with the highest disagreement indexes for manual correction. However, the method was not able to reduce training and tagging times when compared to a random selection strategy. We encourage that future works employ some effort in order to annotate a small amount of random data in the domain of study, which should be enough for higher accuracy rates.
- Find related publications in this database (using NLM MeSH Indexing)
-
Algorithms -
-
Information Storage and Retrieval - methods
-
Natural Language Processing -
-
Natural Language Processing -
-
Semantics -
-
Terminology as Topic -