CSF-GAN: Cross-Modal Semantic Fusion-Based Generative Adversarial Network for Text-Guided Image Inpainting

Abstract

Most visual-guided image inpainting methods based on generative adversarial networks (GANs) struggle when the missing region has weak correlations with the surrounding visual context. Recently, diffusion-based methods guided by textual context have been proposed to address this limitation by leveraging additional semantic information to restore corrupted objects. However, these models typically involve more parameters and exhibit slower generation speeds compared to GAN-based approaches. To address this problem, we propose a novel text-guided image inpainting model, the cross-modal semantic fusion generative adversarial network (CSF-GAN). CSF-GAN is designed as a one-stage GAN with the following key contributions. First, a novel semantic fusion module (SFM) is introduced to integrate sentence- and word-level textual context into the inpainting process, enabling more effective guidance from multi-granularity semantic information. Second, a newly designed word-level local discriminator provides detailed feedback to the generator, enhancing the accuracy of generated content in alignment with word-level semantics. Third, two loss functions, the inpainting loss and edge loss, are employed to enhance both structural coherence and textural realism in the generated results. Extensive experiments on two benchmark datasets demonstrate that CSF-GAN outperforms state-of-the-art methods.

Cite

Text

Zhang et al. "CSF-GAN: Cross-Modal Semantic Fusion-Based Generative Adversarial Network for Text-Guided Image Inpainting." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/265

Markdown

[Zhang et al. "CSF-GAN: Cross-Modal Semantic Fusion-Based Generative Adversarial Network for Text-Guided Image Inpainting." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/zhang2025ijcai-csf/) doi:10.24963/IJCAI.2025/265

BibTeX

@inproceedings{zhang2025ijcai-csf,
  title     = {{CSF-GAN: Cross-Modal Semantic Fusion-Based Generative Adversarial Network for Text-Guided Image Inpainting}},
  author    = {Zhang, Shilin and Wang, Suixue and Zhang, Qingchen and Zhao, Liang and Huo, Weiliang and Hou, Sijia and Fu, Chunjiang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2377-2385},
  doi       = {10.24963/IJCAI.2025/265},
  url       = {https://mlanthology.org/ijcai/2025/zhang2025ijcai-csf/}
}