Causal Inference over Visual-Semantic-Aligned Graph for Image Classification

Meng, Lei; Li, Xiangxian; Yan, Xiaoshuo; Ma, Haokai; Qi, Zhuang; Wu, Wei; Meng, Xiangxu

doi:10.1609/AAAI.V39I18.34141

Causal Inference over Visual-Semantic-Aligned Graph for Image Classification

Lei Meng, Xiangxian Li, Xiaoshuo Yan, Haokai Ma, Zhuang Qi, Wei Wu, Xiangxu Meng

AAAI 2025 pp. 19449-19457

doi:10.1609/AAAI.V39I18.34141 /aaai/2025/meng2025aaai-causal/

Abstract

Incorporating tagging information to regularize the representation learning of images usually leads to improved performance in image classification by aligning the visual features with the textual ones of higher discriminative power. Existing methods typically follow the predictive approach, which uses tags as the semantic labels for visual input to make predictions. However, they typically face the problem of handling the heterogeneity between modalities. In order to learn accurate visual-semantic mapping, this paper presents a visual-semantic causal association modeling framework termed VSCNet. It aligns visual regions with tags, uses a pre-learned hierarchy of visual and semantic exemplars to refine tag predictions and constructs an augmented heterogeneous graph to perform causal intervention. Specifically, the fine-grained visual-semantic alignment (FVA) module adaptively locates the semantic-intensive regions corresponding to tags. The heterogeneous association refinement (HAR) module associates the visual regions, semantic elements and pre-learned visual prototypes in a heterogeneous graph to filter the error predictions and enrich the information. The causal inference with graphical masking (CIM) module applies self-learned masks to discover the causal nodes and edges in the heterogeneous graph to address the spurious association, forming robust causal representations. Experimental results from two benchmarking datasets show that VSCNet effectively builds the visual-semantic associations from images and leads to better performance than the state-of-the-art methods with enriched predictive information.

PDF AAAI Semantic Scholar

Cite

Text

Meng et al. "Causal Inference over Visual-Semantic-Aligned Graph for Image Classification." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I18.34141

Markdown

[Meng et al. "Causal Inference over Visual-Semantic-Aligned Graph for Image Classification." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/meng2025aaai-causal/) doi:10.1609/AAAI.V39I18.34141

BibTeX

@inproceedings{meng2025aaai-causal,
  title     = {{Causal Inference over Visual-Semantic-Aligned Graph for Image Classification}},
  author    = {Meng, Lei and Li, Xiangxian and Yan, Xiaoshuo and Ma, Haokai and Qi, Zhuang and Wu, Wei and Meng, Xiangxu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {19449-19457},
  doi       = {10.1609/AAAI.V39I18.34141},
  url       = {https://mlanthology.org/aaai/2025/meng2025aaai-causal/}
}