Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections

Gong, Yunchao; Wang, Liwei; Hodosh, Micah; Hockenmaier, Julia; Lazebnik, Svetlana

doi:10.1007/978-3-319-10593-2_35

Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections

Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, Svetlana Lazebnik

ECCV 2014 pp. 529-545

doi:10.1007/978-3-319-10593-2_35 /eccv/2014/gong2014eccv-improving/

Abstract

This paper studies the problem of associating images with descriptive sentences by embedding them in a common latent space. We are interested in learning such embeddings from hundreds of thousands or millions of examples. Unfortunately, it is prohibitively expensive to fully annotate this many training images with ground-truth sentences. Instead, we ask whether we can learn better image-sentence embeddings by augmenting small fully annotated training sets with millions of images that have weak and noisy annotations (titles, tags, or descriptions). After investigating several state-of-the-art scalable embedding methods, we introduce a new algorithm called Stacked Auxiliary Embedding that can successfully transfer knowledge from millions of weakly annotated images to improve the accuracy of retrieval-based image description.

PDF ECCV Semantic Scholar

Cite

Text

Gong et al. "Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10593-2_35

Markdown

[Gong et al. "Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/gong2014eccv-improving/) doi:10.1007/978-3-319-10593-2_35

BibTeX

@inproceedings{gong2014eccv-improving,
  title     = {{Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections}},
  author    = {Gong, Yunchao and Wang, Liwei and Hodosh, Micah and Hockenmaier, Julia and Lazebnik, Svetlana},
  booktitle = {European Conference on Computer Vision},
  year      = {2014},
  pages     = {529-545},
  doi       = {10.1007/978-3-319-10593-2_35},
  url       = {https://mlanthology.org/eccv/2014/gong2014eccv-improving/}
}