Causal Inference over Visual-Semantic-Aligned Graph for Image Classification
Abstract
Incorporating tagging information to regularize the representation learning of images usually leads to improved performance in image classification by aligning the visual features with the textual ones of higher discriminative power. Existing methods typically follow the predictive approach, which uses tags as the semantic labels for visual input to make predictions. However, they typically face the problem of handling the heterogeneity between modalities. In order to learn accurate visual-semantic mapping, this paper presents a visual-semantic causal association modeling framework termed VSCNet. It aligns visual regions with tags, uses a pre-learned hierarchy of visual and semantic exemplars to refine tag predictions and constructs an augmented heterogeneous graph to perform causal intervention. Specifically, the fine-grained visual-semantic alignment (FVA) module adaptively locates the semantic-intensive regions corresponding to tags. The heterogeneous association refinement (HAR) module associates the visual regions, semantic elements and pre-learned visual prototypes in a heterogeneous graph to filter the error predictions and enrich the information. The causal inference with graphical masking (CIM) module applies self-learned masks to discover the causal nodes and edges in the heterogeneous graph to address the spurious association, forming robust causal representations. Experimental results from two benchmarking datasets show that VSCNet effectively builds the visual-semantic associations from images and leads to better performance than the state-of-the-art methods with enriched predictive information.
Cite
Text
Meng et al. "Causal Inference over Visual-Semantic-Aligned Graph for Image Classification." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I18.34141Markdown
[Meng et al. "Causal Inference over Visual-Semantic-Aligned Graph for Image Classification." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/meng2025aaai-causal/) doi:10.1609/AAAI.V39I18.34141BibTeX
@inproceedings{meng2025aaai-causal,
title = {{Causal Inference over Visual-Semantic-Aligned Graph for Image Classification}},
author = {Meng, Lei and Li, Xiangxian and Yan, Xiaoshuo and Ma, Haokai and Qi, Zhuang and Wu, Wei and Meng, Xiangxu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {19449-19457},
doi = {10.1609/AAAI.V39I18.34141},
url = {https://mlanthology.org/aaai/2025/meng2025aaai-causal/}
}