A Statistical Method for Handling Unknown Words
Abstract
Robust Natural Language Processing systems must be able to handle words that are not in their lexicon. We created a classifier that was trained on tagged text to find the most likely parts of speech for unknown words. The classifier uses a contingency table to count the observed features, and a loglinear model to smooth the cell counts. After smoothing, the contingency table is used to obtain the conditional probability distribution for classification. A number of features, determined by exploration (Tukey 1977), are used. For example, is the word capitalized? Does the word carry one of a number of known suffixes? We maximize the conditional probability of the proposed classification given the features to achieve minimum error rate classification (Duda & Hart 1973). The baseline results are provided by using only the prior probabilities P(c) (column Prior). (Weischedel et al. 1993) describe a probabilistic model with four features that are treated as independent, which we reimplemented (column 4 Indep). For comparison, we created a statistical classifier with the same four features (column 4 Class). Our best model was a classifier with nine features (column 9 Class).
Cite
Text
Franz. "A Statistical Method for Handling Unknown Words." AAAI Conference on Artificial Intelligence, 1994.Markdown
[Franz. "A Statistical Method for Handling Unknown Words." AAAI Conference on Artificial Intelligence, 1994.](https://mlanthology.org/aaai/1994/franz1994aaai-statistical/)BibTeX
@inproceedings{franz1994aaai-statistical,
title = {{A Statistical Method for Handling Unknown Words}},
author = {Franz, Alexander},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1994},
pages = {1447},
url = {https://mlanthology.org/aaai/1994/franz1994aaai-statistical/}
}