BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning

Abstract

Achieving robust performance in vision-language tasks requires strong multimodal alignment, where textual and visual data interact seamlessly. Existing frameworks often combine contrastive learning with image captioning to unify visual and textual representations. However, reliance on global representations and unidirectional information flow from images to text limits their ability to reconstruct visual content accurately from textual descriptions. To address this limitation, we propose BiMAC, a novel framework that enables bidirectional interactions between images and text at both global and local levels. BiMAC employs advanced components to simultaneously reconstruct visual content from textual cues and generate textual descriptions guided by visual features. By integrating a text-region alignment mechanism, BiMAC identifies and selects relevant image patches for precise cross-modal interaction, reducing information noise and enhancing mapping accuracy. BiMAC achieves state-of-the-art performance across diverse vision-language tasks, including image-text retrieval, captioning, and classification.

Cite

Text

Zareapoor et al. "BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34384

Markdown

[Zareapoor et al. "BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zareapoor2025aaai-bimac/) doi:10.1609/AAAI.V39I21.34384

BibTeX

@inproceedings{zareapoor2025aaai-bimac,
  title     = {{BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning}},
  author    = {Zareapoor, Masoumeh and Shamsolmoali, Pourya and Lu, Yue},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {22290-22298},
  doi       = {10.1609/AAAI.V39I21.34384},
  url       = {https://mlanthology.org/aaai/2025/zareapoor2025aaai-bimac/}
}