Visual Dictionary Learning for Joint Object Categorization and Segmentation

Abstract

Representing objects using elements from a visual dictionary is widely used in object detection and categorization. Prior work on dictionary learning has shown improvements in the accuracy of object detection and categorization by learning discriminative dictionaries. However none of these dictionaries are learnt for joint object categorization and segmentation. Moreover, dictionary learning is often done separately from classifier training, which reduces the discriminative power of the model. In this paper, we formulate the semantic segmentation problem as a joint categorization, segmentation and dictionary learning problem. To that end, we propose a latent conditional random field (CRF) model in which the observed variables are pixel category labels and the latent variables are visual word assignments. The CRF energy consists of a bottom-up segmentation cost, a top-down bag of (latent) words categorization cost, and a dictionary learning cost. Together, these costs capture relationships between image features and visual words, relationships between visual words and object categories, and spatial relationships among visual words. The segmentation, categorization, and dictionary learning parameters are learnt jointly using latent structural SVMs, and the segmentation and visual words assignments are inferred jointly using energy minimization techniques. Experiments on the Graz02 and CamVid datasets demonstrate the performance of our approach.

Cite

Text

Jain et al. "Visual Dictionary Learning for Joint Object Categorization and Segmentation." European Conference on Computer Vision, 2012. doi:10.1007/978-3-642-33715-4_52

Markdown

[Jain et al. "Visual Dictionary Learning for Joint Object Categorization and Segmentation." European Conference on Computer Vision, 2012.](https://mlanthology.org/eccv/2012/jain2012eccv-visual/) doi:10.1007/978-3-642-33715-4_52

BibTeX

@inproceedings{jain2012eccv-visual,
  title     = {{Visual Dictionary Learning for Joint Object Categorization and Segmentation}},
  author    = {Jain, Aastha and Zappella, Luca and McClure, Patrick and Vidal, René},
  booktitle = {European Conference on Computer Vision},
  year      = {2012},
  pages     = {718-731},
  doi       = {10.1007/978-3-642-33715-4_52},
  url       = {https://mlanthology.org/eccv/2012/jain2012eccv-visual/}
}