Top-Down Visual Saliency via Joint CRF and Dictionary Learning

Abstract

Top-down visual saliency facilities object localization by providing a discriminative representation of target objects and a probability map for reducing the search space. In this paper, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary. The proposed model is formulated based on a CRF with latent variables. By using sparse codes as latent variables, we train the dictionary modulated by CRF, and meanwhile a CRF with sparse coding. We propose a max-margin approach to train our model via fast inference algorithms. We evaluate our model on the Graz-02 and PASCAL VOC 2007 datasets. Experimental results show that our model performs favorably against the state-of-the-art top-down saliency methods. We also observe that the dictionary update significantly improves the model performance.

Cite

Text

Yang and Yang. "Top-Down Visual Saliency via Joint CRF and Dictionary Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012. doi:10.1109/CVPR.2012.6247940

Markdown

[Yang and Yang. "Top-Down Visual Saliency via Joint CRF and Dictionary Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2012.](https://mlanthology.org/cvpr/2012/yang2012cvpr-top/) doi:10.1109/CVPR.2012.6247940

BibTeX

@inproceedings{yang2012cvpr-top,
  title     = {{Top-Down Visual Saliency via Joint CRF and Dictionary Learning}},
  author    = {Yang, Jimei and Yang, Ming-Hsuan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2012},
  pages     = {2296-2303},
  doi       = {10.1109/CVPR.2012.6247940},
  url       = {https://mlanthology.org/cvpr/2012/yang2012cvpr-top/}
}