Empirical Acquisition of Word-Sense Distinctions

Abstract

Applications currently make use of many kinds of lexical information: categorial relations (dog is-a canine), synonymy (pooch same-as dog) and word associations (walk occurs-often-with dog). However, an important type of information, differentia, is often omitted, especially in broad-coverage applications. Differentia are properties that distinguish a concept from others belonging to the same higher-level category. For instance, both beagles and wolfhounds are hounds, but the former are small, whereas the latter are quite large. Applications should incorporate differentia to provide finer word-sense distinctions and to facilitate inference of information not mentioned in the text. Determining the differentia is a difficult task, since the available knowledge sources define these properties using natural language. For instance, WordNet (Miller 1990), a commonly used source of lexical knowledge, provides explicit information on categorial relationships but leaves the differentia implicit in the definitions. This work will investigate empirical approaches for extracting these properties from machine readable dictionaries (MRD’s) and text corpora. The result will be lexical relations between the word being defined and words used in the definition. There has been some work on deriving differentia, but these have relied predominantly on manually developed heuristics. Here, corpus-derived associations will augment such heuristics for extracting information from MRD’s. Furthermore, this work will investigate a novel use of Bayesian networks for representing the various types of lexical knowledge in order to model the uncertainty in the relations and support integration of statistical and analytical knowledge. Dictionary definitions use certain fixed patterns, often with prepositional phrases, to indicate differentia. However, since prepositions are highly ambiguous, the same pattern can be used for different properties. To address this problem, syntactic pattern matching will be applied to each definition to identify potential properties. Then, statistical classification will be used to select the most plausible ones. To support this work, a representative sample of definitions will be annotated to indicate the properties that apply. This will serve as the primary training data for the classifier. To allow for a fallback mechanism, a separate classifier will be trained on the semantic role annotations in the second Copyright (~) 1998, American Association for Artificial In

Cite

Text

O'Hara. "Empirical Acquisition of Word-Sense Distinctions." AAAI Conference on Artificial Intelligence, 1998.

Markdown

[O'Hara. "Empirical Acquisition of Word-Sense Distinctions." AAAI Conference on Artificial Intelligence, 1998.](https://mlanthology.org/aaai/1998/oaposhara1998aaai-empirical/)

BibTeX

@inproceedings{oaposhara1998aaai-empirical,
  title     = {{Empirical Acquisition of Word-Sense Distinctions}},
  author    = {O'Hara, Thomas P.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1998},
  pages     = {1179},
  url       = {https://mlanthology.org/aaai/1998/oaposhara1998aaai-empirical/}
}