Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model

Abstract

Recent advancements in language models pre-trained on large-scale corpora have significantly propelled developments in the NLP domain and advanced progress in multimodal tasks. In this paper we propose a Parameter-Efficient multimodal language model learning strategy named QaP (Querying as Prompt). Its core innovation is a novel modality-bridging method that allows a set of modality-specific queries to be input as soft prompts into a frozen pre-trained language model. Specifically we introduce an efficient Text-Conditioned Resampler that is easy to incorporate into the language models which enables adaptive injection of text-related multimodal information at different levels of the model through query learning. This approach effectively bridges multimodal information to the language models while fully leveraging its token fusion and representation potential. We validated our method across four datasets in three distinct multimodal tasks. The results demonstrate that our QaP multimodal language model achieves state-of-the-art performance in various tasks with training only 4.6% parameters.

Cite

Text

Liang et al. "Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02536

Markdown

[Liang et al. "Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/liang2024cvpr-querying/) doi:10.1109/CVPR52733.2024.02536

BibTeX

@inproceedings{liang2024cvpr-querying,
  title     = {{Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model}},
  author    = {Liang, Tian and Huang, Jing and Kong, Ming and Chen, Luyuan and Zhu, Qiang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {26855-26865},
  doi       = {10.1109/CVPR52733.2024.02536},
  url       = {https://mlanthology.org/cvpr/2024/liang2024cvpr-querying/}
}