Find and Perceive: Tell Visual Change with Fine-Grained Comparison

Lv, Feixiao; Wang, Rui; Jing, Lihua; Liu, Lijun

doi:10.24963/IJCAI.2025/654

Find and Perceive: Tell Visual Change with Fine-Grained Comparison

Feixiao Lv, Rui Wang, Lihua Jing, Lijun Liu

IJCAI 2025 pp. 5878-5886

doi:10.24963/IJCAI.2025/654 /ijcai/2025/lv2025ijcai-find/

Abstract

The goal of the image change captioning task is to capture the differences between two similar images and describe them in natural language. In this paper, we decompose this task into two sub-problems, i.e., fine-grained change feature learning and discrimination of changed regions. Compared with existing methods which only focus on change feature learning, we propose a novel change captioning learning paradigm, Find and Perceive (F&P). Our proposed F&P consists of two main ideas, i.e., the Fine-Grained Semantic Change Perception (FGSCP) module for improving the model's perception ability of subtle changes and the Weakly-Supervised Discriminator (WSD) of changed regions for improving the model's sensitivity of localising the important regions. Specifically, the FGSCP deploys a two-step manner, firstly introducing the fine-grained categorisation and then enhancing the interaction of the two paired images. And the WSD adopts the contributions of each image region for final generated captions, accurately indicating which regions are important for change captions without any extra annotations. Finally, we conduct extensive experiments on four change captioning datasets, and experimental results show that our proposed method F&P outperforms existing change caption methods and achieves new state-of-the-art performance.

PDF IJCAI Semantic Scholar

Cite

Text

Lv et al. "Find and Perceive: Tell Visual Change with Fine-Grained Comparison." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/654

Markdown

[Lv et al. "Find and Perceive: Tell Visual Change with Fine-Grained Comparison." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/lv2025ijcai-find/) doi:10.24963/IJCAI.2025/654

BibTeX

@inproceedings{lv2025ijcai-find,
  title     = {{Find and Perceive: Tell Visual Change with Fine-Grained Comparison}},
  author    = {Lv, Feixiao and Wang, Rui and Jing, Lihua and Liu, Lijun},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {5878-5886},
  doi       = {10.24963/IJCAI.2025/654},
  url       = {https://mlanthology.org/ijcai/2025/lv2025ijcai-find/}
}