Find and Perceive: Tell Visual Change with Fine-Grained Comparison
Abstract
The goal of the image change captioning task is to capture the differences between two similar images and describe them in natural language. In this paper, we decompose this task into two sub-problems, i.e., fine-grained change feature learning and discrimination of changed regions. Compared with existing methods which only focus on change feature learning, we propose a novel change captioning learning paradigm, Find and Perceive (F&P). Our proposed F&P consists of two main ideas, i.e., the Fine-Grained Semantic Change Perception (FGSCP) module for improving the model's perception ability of subtle changes and the Weakly-Supervised Discriminator (WSD) of changed regions for improving the model's sensitivity of localising the important regions. Specifically, the FGSCP deploys a two-step manner, firstly introducing the fine-grained categorisation and then enhancing the interaction of the two paired images. And the WSD adopts the contributions of each image region for final generated captions, accurately indicating which regions are important for change captions without any extra annotations. Finally, we conduct extensive experiments on four change captioning datasets, and experimental results show that our proposed method F&P outperforms existing change caption methods and achieves new state-of-the-art performance.
Cite
Text
Lv et al. "Find and Perceive: Tell Visual Change with Fine-Grained Comparison." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/654Markdown
[Lv et al. "Find and Perceive: Tell Visual Change with Fine-Grained Comparison." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/lv2025ijcai-find/) doi:10.24963/IJCAI.2025/654BibTeX
@inproceedings{lv2025ijcai-find,
title = {{Find and Perceive: Tell Visual Change with Fine-Grained Comparison}},
author = {Lv, Feixiao and Wang, Rui and Jing, Lihua and Liu, Lijun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {5878-5886},
doi = {10.24963/IJCAI.2025/654},
url = {https://mlanthology.org/ijcai/2025/lv2025ijcai-find/}
}