Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling

Abstract

Salient Object Ranking (SOR) aims to predict human attention shifts across different salient objects in a scene. Although a number of methods have been proposed for the task, they typically rely on modeling the bottom-up influences of image features on attention shifts. In this work, we observe that when free-viewing an image, humans instinctively browse the objects in such a way as to maximize contextual understanding of the image. This implies a cyclical interaction between content (or story) understanding of the image and attention shift over it. Based on this observation, we propose a novel SOR approach that models this explicit top-down cognitive pathway with two novel modules: a story prediction (SP) module and a guided ranking (GR) module. By formulating content understanding as the image caption generation task, the SP module learns to generate and complete the image captions conditioned on the salient object queries of the GR module, while the GR module learns to detect salient objects and their viewing orders guided by the SP module. Extensive experiments on SOR benchmarks demonstrate that our approach outperforms state-of-the-art SOR methods.

Cite

Text

Guo et al. "Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling." International Conference on Learning Representations, 2026.

Markdown

[Guo et al. "Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/guo2026iclr-salient/)

BibTeX

@inproceedings{guo2026iclr-salient,
  title     = {{Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling}},
  author    = {Guo, Rongjin and Xu, Ke and Lau, Rynson W. H.},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/guo2026iclr-salient/}
}