An Automatic Method for Generating Sense Tagged Corpora

Abstract

The unavailability of very large corpora with se-mantically disambiguated words is a major limi-tation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy re-sults when large corpora are available to develop context rules, to train and test them. This paper presents a novel approach to automat-ically generate arbitrarily large corpora for word senses. The method is based on (1) the infor-mation provided in WordNet, used to formulate queries consisting of synonyms or definitions of word senses, and (2) the information gathered from Internet using existing search engines. The method was tested on 120 word senses and a pre-cision of 91 % was observed.

Cite

Text

Mihalcea and Moldovan. "An Automatic Method for Generating Sense Tagged Corpora." AAAI Conference on Artificial Intelligence, 1999.

Markdown

[Mihalcea and Moldovan. "An Automatic Method for Generating Sense Tagged Corpora." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/mihalcea1999aaai-automatic/)

BibTeX

@inproceedings{mihalcea1999aaai-automatic,
  title     = {{An Automatic Method for Generating Sense Tagged Corpora}},
  author    = {Mihalcea, Rada and Moldovan, Dan I.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {461-466},
  url       = {https://mlanthology.org/aaai/1999/mihalcea1999aaai-automatic/}
}