Harvesting Mid-Level Visual Concepts from Large-Scale Internet Images

Abstract

Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.

Cite

Text

Li et al. "Harvesting Mid-Level Visual Concepts from Large-Scale Internet Images." Conference on Computer Vision and Pattern Recognition, 2013. doi:10.1109/CVPR.2013.115

Markdown

[Li et al. "Harvesting Mid-Level Visual Concepts from Large-Scale Internet Images." Conference on Computer Vision and Pattern Recognition, 2013.](https://mlanthology.org/cvpr/2013/li2013cvpr-harvesting/) doi:10.1109/CVPR.2013.115

BibTeX

@inproceedings{li2013cvpr-harvesting,
  title     = {{Harvesting Mid-Level Visual Concepts from Large-Scale Internet Images}},
  author    = {Li, Quannan and Wu, Jiajun and Tu, Zhuowen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2013},
  doi       = {10.1109/CVPR.2013.115},
  url       = {https://mlanthology.org/cvpr/2013/li2013cvpr-harvesting/}
}