Quality-Based Learning for Web Data Classification

Abstract

The types of web data vary in terms of information quantity and quality. For example, some pages contain numerous texts, whereas some others contain few texts; some web videos are in high resolution, whereas some other web videos are in low resolution. As a consequence, the quality of extracted features from different web data may also vary greatly. Existing learning algorithms on web data classification usually ignore the variations of information quality or quantity. In this paper, the information quantity and quality of web data are described by quality-related factors such as text length and image quantity, and a new learning method is proposed to train classifiers based on quality-related factors. The method divides training data into subsets according to the clustering results of quality-related factors and then trains classifiers by using a multi-task learning strategy for each subset. Experimental results indicate that the quality-related factors are useful in web data classification, and the proposed method outperforms conventional algorithms that do not consider information quantity and quality.

Cite

Text

Wu et al. "Quality-Based Learning for Web Data Classification." AAAI Conference on Artificial Intelligence, 2014. doi:10.1609/AAAI.V28I1.8705

Markdown

[Wu et al. "Quality-Based Learning for Web Data Classification." AAAI Conference on Artificial Intelligence, 2014.](https://mlanthology.org/aaai/2014/wu2014aaai-quality/) doi:10.1609/AAAI.V28I1.8705

BibTeX

@inproceedings{wu2014aaai-quality,
  title     = {{Quality-Based Learning for Web Data Classification}},
  author    = {Wu, Ou and Hu, Ruiguang and Mao, Xue and Hu, Weiming},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2014},
  pages     = {194-200},
  doi       = {10.1609/AAAI.V28I1.8705},
  url       = {https://mlanthology.org/aaai/2014/wu2014aaai-quality/}
}