DECIDER: Difference-Aware Contrastive Diffusion Model with Adversarial Perturbations for Image Change Captioning

Abstract

Image change captioning (ICC) poses great challenges stemming from describing subtle differences between two similar images in natural language, significantly increasing the complexity of feature extraction and cross-modal learning compared to the image captioning task. Existing ICC methods often suffer from two key challenges: 1) Massive irrelevant information of uni-image features leads to suboptimal visual difference representations; 2) Imprecise inter-modality correspondence degrades the quality of generated captions. This paper proposes a Difference-aware Contrastive Diffusion Model with Adversarial Perturbations (DECIDER) for ICC due to the excellent performance of diffusion models in image/text generation. Technically, difference-aware cross-modal learning is developed to suppress irrelevant information and learn compact yet robust visual difference representations. This is achieved by optimizing a novel objective mathematically derived from the information bottleneck principle that excels in filtering redundant features and highlighting differences. Furthermore, we propose to dynamically generate ``hard'' positive and negative samples via adversarial perturbations, which are involved in contrastive diffusion training with a tighter variational bound. This design encourages our DECIDER to excavate and construct complex correspondences between visual differences and captions, thereby improving generalization performance. Extensive experiments on four datasets demonstrate that DECIDER significantly exceeds state-of-the-art performance.

Cite

Text

Zhong et al. "DECIDER: Difference-Aware Contrastive Diffusion Model with Adversarial Perturbations for Image Change Captioning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33158

Markdown

[Zhong et al. "DECIDER: Difference-Aware Contrastive Diffusion Model with Adversarial Perturbations for Image Change Captioning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhong2025aaai-decider/) doi:10.1609/AAAI.V39I10.33158

BibTeX

@inproceedings{zhong2025aaai-decider,
  title     = {{DECIDER: Difference-Aware Contrastive Diffusion Model with Adversarial Perturbations for Image Change Captioning}},
  author    = {Zhong, Guojin and Hu, Jinhong and Chen, Jiajun and Yuan, Jin and Pan, Wenbo},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10662-10670},
  doi       = {10.1609/AAAI.V39I10.33158},
  url       = {https://mlanthology.org/aaai/2025/zhong2025aaai-decider/}
}