Learning Structured Appearance Models from Captioned Images of Cluttered Scenes
Abstract
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to learn both the names and appearances of the objects. Only a small number of local features within any given image are associated with a particular caption word. We describe a connected graph appearance model where vertices represent local features and edges encode spatial relationships. We use the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to guide the search for meaningful feature configurations. We demonstrate improved results on a dataset to which an unstructured object model was previously applied. We also apply the new method to a more challenging collection of captioned images from the web, detecting and annotating objects within highly cluttered realistic scenes. 1.
Cite
Text
Jamieson et al. "Learning Structured Appearance Models from Captioned Images of Cluttered Scenes." IEEE/CVF International Conference on Computer Vision, 2007. doi:10.1109/ICCV.2007.4408877Markdown
[Jamieson et al. "Learning Structured Appearance Models from Captioned Images of Cluttered Scenes." IEEE/CVF International Conference on Computer Vision, 2007.](https://mlanthology.org/iccv/2007/jamieson2007iccv-learning/) doi:10.1109/ICCV.2007.4408877BibTeX
@inproceedings{jamieson2007iccv-learning,
title = {{Learning Structured Appearance Models from Captioned Images of Cluttered Scenes}},
author = {Jamieson, Michael and Fazly, Afsaneh and Dickinson, Sven J. and Stevenson, Suzanne and Wachsmuth, Sven},
booktitle = {IEEE/CVF International Conference on Computer Vision},
year = {2007},
pages = {1-8},
doi = {10.1109/ICCV.2007.4408877},
url = {https://mlanthology.org/iccv/2007/jamieson2007iccv-learning/}
}