Learning Reconfigurable Scene Representation by Tangram Model

Abstract

This paper proposes a method to learn reconfigurable and sparse scene representation in the joint space of spatial configuration and appearance in a principled way. We call it the tangram model, which has three properties: (1) Unlike fixed structure of the spatial pyramid widely used in the literature, we propose a compositional shape dictionary organized in an And-Or directed acyclic graph (AOG) to quantize the space of spatial configurations. (2) The shape primitives (called tans) in the dictionary can be described by using any "off-the-shelf" appearance features according to different tasks. (3) A dynamic programming (DP) algorithm is utilized to learn the globally optimal parse tree in the joint space of spatial configuration and appearance. We demonstrate the tangram model in both a generative learning formulation and a discriminative matching kernel. In experiments, we show that the tangram model is capable of capturing meaningful spatial configurations as well as appearance for various scene categories, and achieves state-of-the-art classification performance on the LSP 15-class scene dataset and the MIT 67-class indoor scene dataset.

Cite

Text

Zhu et al. "Learning Reconfigurable Scene Representation by Tangram Model." IEEE/CVF Winter Conference on Applications of Computer Vision, 2012. doi:10.1109/WACV.2012.6163023

Markdown

[Zhu et al. "Learning Reconfigurable Scene Representation by Tangram Model." IEEE/CVF Winter Conference on Applications of Computer Vision, 2012.](https://mlanthology.org/wacv/2012/zhu2012wacv-learning/) doi:10.1109/WACV.2012.6163023

BibTeX

@inproceedings{zhu2012wacv-learning,
  title     = {{Learning Reconfigurable Scene Representation by Tangram Model}},
  author    = {Zhu, Jun and Wu, Tianfu and Zhu, Song-Chun and Yang, Xiaokang and Zhang, Wenjun},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2012},
  pages     = {449-456},
  doi       = {10.1109/WACV.2012.6163023},
  url       = {https://mlanthology.org/wacv/2012/zhu2012wacv-learning/}
}