Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities

Abstract

Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents _Multimodal Lego_ (MM-Lego), a general-purpose fusion framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We achieve this by introducing a wrapper for any unimodal encoders that enforces shape consistency between modality representations and harmonises these representations by learning features in the frequency domain to enable model merging with little signal interference. We show that MM-Lego 1) can be used as a _model merging_ method which achieves competitive performance with end-to-end fusion models _without any fine-tuning_, 2) can operate on any unimodal encoder, and 3) is a _model fusion_ method that, with minimal fine-tuning, achieves state-of-the-art results on six benchmarked multimodal biomedical tasks.

Cite

Text

Hemker et al. "Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities." NeurIPS 2024 Workshops: AIM-FM, 2024.

Markdown

[Hemker et al. "Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/hemker2024neuripsw-multimodal/)

BibTeX

@inproceedings{hemker2024neuripsw-multimodal,
  title     = {{Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities}},
  author    = {Hemker, Konstantin and Simidjievski, Nikola and Jamnik, Mateja},
  booktitle = {NeurIPS 2024 Workshops: AIM-FM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/hemker2024neuripsw-multimodal/}
}