Model Alignment Using Inter-Modal Bridges

ICLRW 2025

/iclrw/2025/gholamzadeh2025iclrw-model/

Abstract

Foundation models have demonstrated remarkable performance across modalities such as language and vision. However, inter-modal model reuse remains limited due to the difficulty of aligning internal representations. Existing methods require extensive paired training data or are constrained to specific domains. We introduce a semi-supervised approach for model alignment via conditional flow matching. The conditional flow between latent spaces of different modalities (e.g., text-to-image or biological-to-artificial neuronal activity) can be learned in two settings: ($1$) solving a (balanced or unbalanced) optimal transport problem with an inter-space bridge cost, and ($2$) performing memory-efficient alignment using labelled exemplars. Despite being constrained by the original models' capacity, our method--under both settings--matches downstream task performance of end-to-end trained models on object recognition and image generation tasks across MNIST, ImageNet, and Majaj et al. (2015) datasets, particularly when labelled training data is scarce ($<20\%$). Our method provides a data-efficient solution for inter-modal model alignment with minimal supervision.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Gholamzadeh and Sajid. "Model Alignment Using Inter-Modal Bridges." ICLR 2025 Workshops: Re-Align, 2025.

Markdown

[Gholamzadeh and Sajid. "Model Alignment Using Inter-Modal Bridges." ICLR 2025 Workshops: Re-Align, 2025.](https://mlanthology.org/iclrw/2025/gholamzadeh2025iclrw-model/)

BibTeX

@inproceedings{gholamzadeh2025iclrw-model,
  title     = {{Model Alignment Using Inter-Modal Bridges}},
  author    = {Gholamzadeh, Ali and Sajid, Noor},
  booktitle = {ICLR 2025 Workshops: Re-Align},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/gholamzadeh2025iclrw-model/}
}