Mining Soft-Matching Rules from Textual Data
Abstract
Text mining concerns the discovery of knowledge from unstructured textual data. One important task is the discovery of rules that relate specific words and phrases. Although existing methods for this task learn traditional logical rules, soft-matching methods that utilize word-frequency information generally work better for textual data. This paper presents a rule induction system, TEXTRISE, that allows for partial matching of text-valued features by combining rule-based and instance-based learning. We present initial experiments applying TEX- TRISE to corpora of book descriptions and patent documents retrieved from the web and compare its results to those of traditional rule and instance based methods. 1 Introduction Text mining, discovering knowledge from unstructured natural-language text, is an important data mining problem attracting increasing attention [Hearst, 1999; Feldman, 1999; Mladenic, 2000] . Existing methods for mining rules from text use a hard, logical ...
Cite
Text
Nahm and Mooney. "Mining Soft-Matching Rules from Textual Data." International Joint Conference on Artificial Intelligence, 2001.Markdown
[Nahm and Mooney. "Mining Soft-Matching Rules from Textual Data." International Joint Conference on Artificial Intelligence, 2001.](https://mlanthology.org/ijcai/2001/nahm2001ijcai-mining/)BibTeX
@inproceedings{nahm2001ijcai-mining,
title = {{Mining Soft-Matching Rules from Textual Data}},
author = {Nahm, Un Yong and Mooney, Raymond J.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2001},
pages = {979-986},
url = {https://mlanthology.org/ijcai/2001/nahm2001ijcai-mining/}
}