Significant Lexical Relationships
Abstract
Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests. Introduction Statistical Natural Language Processing (NLP) seeks to make general claims about human language from an empirical study of examples of human speech or text. Empirical studies of language implicitly or explicitly define a probabilistic model for the characteristic being studied. In significance testing, the probabilistic model is a potential description of the distribution of that characteristic in the popula...
Cite
Text
Pedersen et al. "Significant Lexical Relationships." AAAI Conference on Artificial Intelligence, 1996.Markdown
[Pedersen et al. "Significant Lexical Relationships." AAAI Conference on Artificial Intelligence, 1996.](https://mlanthology.org/aaai/1996/pedersen1996aaai-significant/)BibTeX
@inproceedings{pedersen1996aaai-significant,
title = {{Significant Lexical Relationships}},
author = {Pedersen, Ted and Kayaalp, Mehmet and Bruce, Rebecca F.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1996},
pages = {455-460},
url = {https://mlanthology.org/aaai/1996/pedersen1996aaai-significant/}
}