A Mutually Beneficial Integration of Data Mining and Information Extraction

Abstract

Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DISCOTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job announcement postings from an Internet newsgroup.

Cite

Text

Nahm and Mooney. "A Mutually Beneficial Integration of Data Mining and Information Extraction." AAAI Conference on Artificial Intelligence, 2000.

Markdown

[Nahm and Mooney. "A Mutually Beneficial Integration of Data Mining and Information Extraction." AAAI Conference on Artificial Intelligence, 2000.](https://mlanthology.org/aaai/2000/nahm2000aaai-mutually/)

BibTeX

@inproceedings{nahm2000aaai-mutually,
  title     = {{A Mutually Beneficial Integration of Data Mining and Information Extraction}},
  author    = {Nahm, Un Yong and Mooney, Raymond J.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2000},
  pages     = {627-632},
  url       = {https://mlanthology.org/aaai/2000/nahm2000aaai-mutually/}
}