LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning
Abstract
Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding. However, existing methods rely heavily on extensive modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities. In this paper, we propose \textbf{PathWeave}, a flexible and scalable framework with modal-\textbf{path} s\textbf{w}itching and \textbf{e}xp\textbf{a}nsion abilities that enables MLLMs to continually \textbf{ev}olve on modalities for $\mathbb{X}$-modal reasoning. We leverage the concept of Continual Learning and develop an incremental training strategy atop pre-trained MLLMs, enabling their expansion to new modalities using uni-modal data, without executing joint-modal pretraining. In detail, a novel Adapter-in-Adapter (AnA) framework is introduced, in which uni-modal and cross-modal adapters are seamlessly integrated to facilitate efficient modality alignment and collaboration. Additionally, an MoE-based gating module is applied between two types of adapters to further enhance the multimodal interaction. To investigate the proposed method, we establish a challenging benchmark called \textbf{C}ontinual \textbf{L}earning of \textbf{M}odality (MCL), which consists of high-quality QA data from five distinct modalities: image, video, \textcolor{black}audio, depth and point cloud. Extensive experiments demonstrate the effectiveness of the proposed AnA framework on learning plasticity and memory stability during continual learning. Furthermore, PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73\%. Our code locates at \url{https://github.com/JiazuoYu/PathWeave}.
Cite
Text
Yu et al. "LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning." Neural Information Processing Systems, 2024. doi:10.52202/079017-1578Markdown
[Yu et al. "LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yu2024neurips-llms/) doi:10.52202/079017-1578BibTeX
@inproceedings{yu2024neurips-llms,
title = {{LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning}},
author = {Yu, Jiazuo and Xiong, Haomiao and Zhang, Lu and Diao, Haiwen and Zhuge, Yunzhi and Hong, Lanqing and Wang, Dong and Lu, Huchuan and He, You and Chen, Long},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1578},
url = {https://mlanthology.org/neurips/2024/yu2024neurips-llms/}
}