STiL: Semi-Supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification

Siyi Du, Xinzhe Luo, Declan P. O'Regan, Chen Qin

CVPR 2025 pp. 15549-15559

doi:10.1109/CVPR52734.2025.01449 /cvpr/2025/du2025cvpr-stil/

Abstract

Multimodal image-tabular learning is gaining attention, yet it faces challenges due to limited labeled data. While earlier work has applied self-supervised learning (SSL) to unlabeled data, its task-agnostic nature often results in learning suboptimal features for downstream tasks. Semi-supervised learning (SemiSL), which combines labeled and unlabeled data, offers a promising solution. However, existing multimodal SemiSL methods typically focus on unimodal or modality-shared features, ignoring valuable task-relevant modality-specific information, leading to a Modality Information Gap. In this paper, we propose STiL, a novel SemiSL tabular-image framework that addresses this gap by comprehensively exploring task-relevant information. STiL features a new disentangled contrastive consistency module to learn cross-modal invariant representations of shared information while retaining modality-specific information via disentanglement. We also propose a novel consensus-guided pseudo-labeling strategy to generate reliable pseudo-labels based on classifier consensus, along with a new prototype-guided label smoothing technique to refine pseudo-label quality with prototype embeddings, thereby enhancing task-relevant information learning in unlabeled data. Experiments on natural and medical image datasets show that STiL outperforms the state-of-the-art supervised/SSL/SemiSL image/multimodal approaches. Our code is available at https://github.com/siyi-wind/STiL.

PDF CVPR Semantic Scholar

Cite

Text

Du et al. "STiL: Semi-Supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01449

Markdown

[Du et al. "STiL: Semi-Supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/du2025cvpr-stil/) doi:10.1109/CVPR52734.2025.01449

BibTeX

@inproceedings{du2025cvpr-stil,
  title     = {{STiL: Semi-Supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification}},
  author    = {Du, Siyi and Luo, Xinzhe and O'Regan, Declan P. and Qin, Chen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {15549-15559},
  doi       = {10.1109/CVPR52734.2025.01449},
  url       = {https://mlanthology.org/cvpr/2025/du2025cvpr-stil/}
}