Region-Aware Difference Distilling with Attribute-Guided Contrastive Regularization for Change Captioning

Abstract

Change captioning aims to describe the differences between two similar images using natural language, significantly aiding in understanding and monitoring changes. This challenging task requires a fine-grained understanding of subtle changes while resisting disturbances like viewpoint shifts and illumination variations. Existing methods often rely solely on global difference features and lack comprehensive alignment of linguistic and visual information, leading to overlooking fine-grained details and generating semantic hallucinated sentences. To address these limitations, we propose the region-aware difference distilling (RDD) network with attribute-guided contrastive regularization (ACR). The RDD uses global difference features to progressively distill regional difference features using learnable vectors, allowing for more precise identification of changed regions. The ACR enhances comprehensive alignment between linguistic and visual information by formulating Nouns-to-Objects (N2O) and Verbs-to-Actions (V2A) alignment losses to regularize the regional difference features. Promising results on three datasets demonstrate that our method outperforms the state-of-the-art change captioning methods.

Cite

Text

Li et al. "Region-Aware Difference Distilling with Attribute-Guided Contrastive Regularization for Change Captioning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I5.32517

Markdown

[Li et al. "Region-Aware Difference Distilling with Attribute-Guided Contrastive Regularization for Change Captioning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/li2025aaai-region/) doi:10.1609/AAAI.V39I5.32517

BibTeX

@inproceedings{li2025aaai-region,
  title     = {{Region-Aware Difference Distilling with Attribute-Guided Contrastive Regularization for Change Captioning}},
  author    = {Li, Rong and Li, Liang and Zhang, Jiehua and Zhao, Qiang and Wang, Hongkui and Yan, Chenggang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {4887-4895},
  doi       = {10.1609/AAAI.V39I5.32517},
  url       = {https://mlanthology.org/aaai/2025/li2025aaai-region/}
}