Part-of-Speech Tagging Using Decision Trees
Abstract
We have applied inductive learning of statistical decision trees to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). Previous work showed that the acquired language models are independent enough to be easily incorporated, as a statistical core of rules, in any flexible tagger. They are also complete enough to be directly used as sets of POS disambiguation rules. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. In this paper we basically address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation. In addition we also face the problem of dealing with unknown words under the same conditions of lacking training examples. In this case some comparative results and comments about close related work are reported.
Cite
Text
Màrquez and Rodríguez. "Part-of-Speech Tagging Using Decision Trees." European Conference on Machine Learning, 1998. doi:10.1007/BFB0026668Markdown
[Màrquez and Rodríguez. "Part-of-Speech Tagging Using Decision Trees." European Conference on Machine Learning, 1998.](https://mlanthology.org/ecmlpkdd/1998/marquez1998ecml-partofspeech/) doi:10.1007/BFB0026668BibTeX
@inproceedings{marquez1998ecml-partofspeech,
title = {{Part-of-Speech Tagging Using Decision Trees}},
author = {Màrquez, Lluís and Rodríguez, Horacio},
booktitle = {European Conference on Machine Learning},
year = {1998},
pages = {25-36},
doi = {10.1007/BFB0026668},
url = {https://mlanthology.org/ecmlpkdd/1998/marquez1998ecml-partofspeech/}
}