Ranking and Retrieval of Image Sequences from Multiple Paragraph Queries
Abstract
We propose a method to rank and retrieve image sequences from a natural language text query, consisting of multiple sentences or paragraphs. One of the method's key applications is to visualize visitors' text-only reviews on TRIPADVISOR or YELP, by automatically retrieving the most illustrative image sequences. While most previous work has dealt with the relations between a natural language sentence and an image or a video, our work extends to the relations between paragraphs and image sequences. Our approach leverages the vast user-generated resource of blog posts and photo streams on the Web. We use blog posts as text-image parallel training data that co-locate informative text with representative images that are carefully selected by users. We exploit large-scale photo streams to augment the image samples for retrieval. We design a latent structural SVM framework to learn the semantic relevance relations between text and image sequences. We present both quantitative and qualitative results on the newly created DISNEYLAND dataset.
Cite
Text
Kim et al. "Ranking and Retrieval of Image Sequences from Multiple Paragraph Queries." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7298810Markdown
[Kim et al. "Ranking and Retrieval of Image Sequences from Multiple Paragraph Queries." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/kim2015cvpr-ranking/) doi:10.1109/CVPR.2015.7298810BibTeX
@inproceedings{kim2015cvpr-ranking,
title = {{Ranking and Retrieval of Image Sequences from Multiple Paragraph Queries}},
author = {Kim, Gunhee and Moon, Seungwhan and Sigal, Leonid},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2015},
doi = {10.1109/CVPR.2015.7298810},
url = {https://mlanthology.org/cvpr/2015/kim2015cvpr-ranking/}
}