Integrating Background Knowledge into Text Classification
Abstract
We present a description of three different algorithms that use background knowledge to improve text classifiers. One uses the background knowledge as an index into the set of training examples. The second method uses background knowledge to reexpress the training examples. The last method treats pieces of background knowledge as unlabeled examples, and actually classifies them. The choice of background knowledge affects each method’s performance and we discuss which type of background knowledge is most useful for each specific method. 1 Using Background Knowledge Supervised learning algorithms rely on a corpus of labeled training examples to produce accurate automatic text classifiers. An insufficient number of training examples often results in learned models that are suboptimal when classifying previously unseen examples. Numerous different approaches have been taken to compensate for the lack of training examples. These include the use of unlabeled examples [Bennet and Demiriz, 1998; Blum and Mitchell, 1998; Nigam et al., 2000; Goldman and Zhou, 2000], the use of test examples [Joachims, 1999], and choosing a small set of specific unlabeled examples to be manually classified [Lewis and Gale, 1994]. Our approach does not assume the availability of either unlabeled examples or test examples. As a result of the explosion of the amount of data that is available, it is often the case that text, databases and other sources of knowledge that are related to the text classification task are readily available from the World Wide Web. We incorporate such “background knowledge ” into different learners to improve classification of unknown instances. The use of external readily available textual resources allows learning systems to model the domain in a way that would be impossible by simply using a small set of training instances. For example, if a text classification task is to determine the sub-discipline of physics that a paper title should belong to, background knowledge such as abstracts, physics newsgroups, and perhaps even book reviews of physics books can be used by learners to create more accurate classifiers.
Cite
Text
Zelikovitz and Hirsh. "Integrating Background Knowledge into Text Classification." International Joint Conference on Artificial Intelligence, 2003.Markdown
[Zelikovitz and Hirsh. "Integrating Background Knowledge into Text Classification." International Joint Conference on Artificial Intelligence, 2003.](https://mlanthology.org/ijcai/2003/zelikovitz2003ijcai-integrating/)BibTeX
@inproceedings{zelikovitz2003ijcai-integrating,
title = {{Integrating Background Knowledge into Text Classification}},
author = {Zelikovitz, Sarah and Hirsh, Haym},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2003},
pages = {1448-1449},
url = {https://mlanthology.org/ijcai/2003/zelikovitz2003ijcai-integrating/}
}