Reconfigurable Models for Scene Recognition

Abstract

We propose a new latent variable model for scene recognition. Our approach represents a scene as a collection of region models ("parts") arranged in a reconfigurable pattern. We partition an image into a predefined set of regions and use a latent variable to specify which region model is assigned to each image region. In our current implementation we use a bag of words representation to capture the appearance of an image region. The resulting method generalizes a spatial bag of words approach that relies on a fixed model for the bag of words in each image region. Our models can be trained using both generative and discriminative methods. In the generative setting we use the Expectation-Maximization (EM) algorithm to estimate model parameters from a collection of images with category labels. In the discriminative setting we use a latent structural SVM (LSSVM). We note that LSSVMs can be very sensitive to initialization and demonstrate that generative training with EM provides a good initialization for discriminative training with LSSVM.

Cite

Text

Parizi et al. "Reconfigurable Models for Scene Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012. doi:10.1109/CVPR.2012.6248001

Markdown

[Parizi et al. "Reconfigurable Models for Scene Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012.](https://mlanthology.org/cvpr/2012/parizi2012cvpr-reconfigurable/) doi:10.1109/CVPR.2012.6248001

BibTeX

@inproceedings{parizi2012cvpr-reconfigurable,
  title     = {{Reconfigurable Models for Scene Recognition}},
  author    = {Parizi, Sobhan Naderi and Oberlin, John G. and Felzenszwalb, Pedro F.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2012},
  pages     = {2775-2782},
  doi       = {10.1109/CVPR.2012.6248001},
  url       = {https://mlanthology.org/cvpr/2012/parizi2012cvpr-reconfigurable/}
}