Integrating Multiple Internet Directories by Instance-Based Learning

Abstract

Finding desired information on the Internet is be-coming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hi-erarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the col-lection and categorization of pages. We propose a method for integrating multiple Internet directo-ries by instance-based learning. Our method pro-vides the mapping of categories in order to trans-fer documents from one directory to another, in-stead of simply merging two directories into one. We present herein an effective algorithm for de-termining similar categories between two directo-ries via a statistical method called the κ-statistic. In order to evaluate the proposed method, we con-ducted experiments using two actual Internet direc-tories, Yahoo! and Google. The results show that the proposed method achieves extensive improve-ments relative to both the Naive Bayes and En-hanced Naive Bayes approaches, without any text analysis on documents. 1

Cite

Text

Ichise et al. "Integrating Multiple Internet Directories by Instance-Based Learning." International Joint Conference on Artificial Intelligence, 2003.

Markdown

[Ichise et al. "Integrating Multiple Internet Directories by Instance-Based Learning." International Joint Conference on Artificial Intelligence, 2003.](https://mlanthology.org/ijcai/2003/ichise2003ijcai-integrating/)

BibTeX

@inproceedings{ichise2003ijcai-integrating,
  title     = {{Integrating Multiple Internet Directories by Instance-Based Learning}},
  author    = {Ichise, Ryutaro and Takeda, Hideaki and Honiden, Shinichi},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2003},
  pages     = {22-30},
  url       = {https://mlanthology.org/ijcai/2003/ichise2003ijcai-integrating/}
}