mPLUG-Owl2: Revolutionizing Multi-Modal Large Language Model with Modality Collaboration

Abstract

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However previous methods have primarily focused on enhancing multi-modal capabilities. In this work we introduce a versatile multi-modal large language model mPLUG-Owl2 which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design with the language decoder acting as a universal interface for managing different modalities. Specifically mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks while achieving state-of-the-art performances with a single generalized model. Notably mPLUG-Owl2 is the first MLLM model that demonstrates the modality collaboration phenomenon in both pure-text and multi-modal scenarios setting a pioneering path in the development of future multi-modal foundation models.

Cite

Text

Ye et al. "mPLUG-Owl2: Revolutionizing Multi-Modal Large Language Model with Modality Collaboration." Conference on Computer Vision and Pattern Recognition, 2024.

Markdown

[Ye et al. "mPLUG-Owl2: Revolutionizing Multi-Modal Large Language Model with Modality Collaboration." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/ye2024cvpr-mplugowl2/)

BibTeX

@inproceedings{ye2024cvpr-mplugowl2,
  title     = {{mPLUG-Owl2: Revolutionizing Multi-Modal Large Language Model with Modality Collaboration}},
  author    = {Ye, Qinghao and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Hu, Anwen and Liu, Haowei and Qian, Qi and Zhang, Ji and Huang, Fei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {13040-13051},
  url       = {https://mlanthology.org/cvpr/2024/ye2024cvpr-mplugowl2/}
}