Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction
Abstract
Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such a text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semi-supervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.
Cite
Text
Jing et al. "Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction." International Joint Conference on Artificial Intelligence, 2015.Markdown
[Jing et al. "Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction." International Joint Conference on Artificial Intelligence, 2015.](https://mlanthology.org/ijcai/2015/jing2015ijcai-web/)BibTeX
@inproceedings{jing2015ijcai-web,
title = {{Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction}},
author = {Jing, Xiao-Yuan and Liu, Qian and Wu, Fei and Xu, Baowen and Zhu, Yang-Ping and Chen, Songcan},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2015},
pages = {2255-2261},
url = {https://mlanthology.org/ijcai/2015/jing2015ijcai-web/}
}