Integrating Multiple Internet Directories by Instance-Based Learning
Abstract
Finding desired information on the Internet is be-coming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hi-erarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the col-lection and categorization of pages. We propose a method for integrating multiple Internet directo-ries by instance-based learning. Our method pro-vides the mapping of categories in order to trans-fer documents from one directory to another, in-stead of simply merging two directories into one. We present herein an effective algorithm for de-termining similar categories between two directo-ries via a statistical method called the κ-statistic. In order to evaluate the proposed method, we con-ducted experiments using two actual Internet direc-tories, Yahoo! and Google. The results show that the proposed method achieves extensive improve-ments relative to both the Naive Bayes and En-hanced Naive Bayes approaches, without any text analysis on documents. 1
Cite
Text
Ichise et al. "Integrating Multiple Internet Directories by Instance-Based Learning." International Joint Conference on Artificial Intelligence, 2003.Markdown
[Ichise et al. "Integrating Multiple Internet Directories by Instance-Based Learning." International Joint Conference on Artificial Intelligence, 2003.](https://mlanthology.org/ijcai/2003/ichise2003ijcai-integrating/)BibTeX
@inproceedings{ichise2003ijcai-integrating,
title = {{Integrating Multiple Internet Directories by Instance-Based Learning}},
author = {Ichise, Ryutaro and Takeda, Hideaki and Honiden, Shinichi},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2003},
pages = {22-30},
url = {https://mlanthology.org/ijcai/2003/ichise2003ijcai-integrating/}
}