Exploiting Diversity for Natural Language Processing
Abstract
Accurate linguistic annotation is a core requirement of natural language processing systems. The demand for accuracy in the face of rapid prototyping constraints and numerous target languages has led to the employment of machine learning methods for developing linguistic annotation systems. The popularity of applying machine learning methods to computational linguistics problems has given rise to a large supply of trainable natural language processing systems. Most problems of interest have an array of off-the-shelf products or downloadable code implementing solutions using various techniques. In situations where these solutions are developed independently, it is observed that their errors tend to be independently distributed. In this thesis we discuss approaches for capitalizing on this situation in a sample problem domain, Penn Treebank-style parsing. The machine learning community provides us with techniques for combining outputs of classifiers, but parser output is more structured and interdependent than classifications. To overcome this, two novel strategies for combining parsers are used: learning to control a switch between parsers and constructing a hybrid parse from multiple parsers' outputs. In this thesis we give supervised and unsupervised techniques for each of these strategies as well as performance and robustness results from evaluation of the techniques. One shortcoming of combining off-the-shelf parsers is that the parsers are not developed with the intention to perform well on complementary data or to compensate for each others' weaknesses. The individual parsers are globally optimized. We present two techniques for producing an ensemble of parsers in such a way that their outputs can be constructively combined. All of the ensemble members will be created using the same underlying parser induction algorithm, and the method for producing complementary parsers is only loosely coupled to that algorithm.
Cite
Text
Henderson. "Exploiting Diversity for Natural Language Processing." AAAI Conference on Artificial Intelligence, 1998.Markdown
[Henderson. "Exploiting Diversity for Natural Language Processing." AAAI Conference on Artificial Intelligence, 1998.](https://mlanthology.org/aaai/1998/henderson1998aaai-exploiting/)BibTeX
@inproceedings{henderson1998aaai-exploiting,
title = {{Exploiting Diversity for Natural Language Processing}},
author = {Henderson, John C.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1998},
pages = {1174},
url = {https://mlanthology.org/aaai/1998/henderson1998aaai-exploiting/}
}