Self-Supervised Learning of Visual Features Through Embedding Images into Text Topic Spaces

Lluis Gomez, Yash Patel, Marcal Rusinol, Dimosthenis Karatzas, C. V. Jawahar

CVPR 2017

doi:10.1109/CVPR.2017.218 /cvpr/2017/gomez2017cvpr-selfsupervised/

Abstract

End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.

PDF CVPR Semantic Scholar

Cite

Text

Gomez et al. "Self-Supervised Learning of Visual Features Through Embedding Images into Text Topic Spaces." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.218

Markdown

[Gomez et al. "Self-Supervised Learning of Visual Features Through Embedding Images into Text Topic Spaces." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/gomez2017cvpr-selfsupervised/) doi:10.1109/CVPR.2017.218

BibTeX

@inproceedings{gomez2017cvpr-selfsupervised,
  title     = {{Self-Supervised Learning of Visual Features Through Embedding Images into Text Topic Spaces}},
  author    = {Gomez, Lluis and Patel, Yash and Rusinol, Marcal and Karatzas, Dimosthenis and Jawahar, C. V.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.218},
  url       = {https://mlanthology.org/cvpr/2017/gomez2017cvpr-selfsupervised/}
}