CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection

Zhang, Lu; Zhang, Jianming; Lin, Zhe; Lu, Huchuan; He, You

doi:10.1109/CVPR.2019.00618

CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection

Lu Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu, You He

CVPR 2019

doi:10.1109/CVPR.2019.00618 /cvpr/2019/zhang2019cvpr-capsal/

Abstract

Detecting salient objects in cluttered scenes is a big challenge. To address this problem, we argue that the model needs to learn discriminative semantic features for salient objects. To this end, we propose to leverage captioning as an auxiliary semantic task to boost salient object detection in complex scenarios. Specifically, we develop a CapSal model which consists of two sub-networks, the Image Captioning Network (ICN) and the Local-Global Perception Network (LGPN). ICN encodes the embedding of a generated caption to capture the semantic information of major objects in the scene, while LGPN incorporates the captioning embedding with local-global visual contexts for predicting the saliency map. ICN and LGPN are jointly trained to model high-level semantics as well as visual saliency. Extensive experiments demonstrate the effectiveness of image captioning in boosting the performance of salient object detection. In particular, our model performs significantly better than the state-of-the-art methods on several challenging datasets of complex scenarios.

PDF CVPR Semantic Scholar

Cite

Text

Zhang et al. "CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00618

Markdown

[Zhang et al. "CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/zhang2019cvpr-capsal/) doi:10.1109/CVPR.2019.00618

BibTeX

@inproceedings{zhang2019cvpr-capsal,
  title     = {{CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection}},
  author    = {Zhang, Lu and Zhang, Jianming and Lin, Zhe and Lu, Huchuan and He, You},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00618},
  url       = {https://mlanthology.org/cvpr/2019/zhang2019cvpr-capsal/}
}