Prompt Learning with Quaternion Networks

Abstract

Multimodal pre-trained models have shown impressive potential in enhancing performance on downstream tasks. However, existing fusion strategies for modalities primarily rely on explicit interaction structures that fail to capture the diverse aspects and patterns inherent in input data. This yields limited performance in zero-shot contexts, especially when fine-grained classifications and abstract interpretations are required. To address this, we propose an effective approach, namely Prompt Learning with Quaternion Networks (QNet), for semantic alignment across diverse modalities. QNet employs a quaternion hidden space where the mutually orthogonal imaginary axes capture rich intermodal semantic spatial correlations from various perspectives. Hierarchical features across multilayers are utilized to encode intricate interdependencies within various modalities with reduced parameters. Our experiments on 11 datasets demonstrate that QNet outperforms state-of-the-art prompt learning techniques in base-to-novel generalization, cross-dataset transfer, and domain transfer scenarios with fewer learnable parameters. The source code is available at https://github.com/VISION-SJTU/QNet.

Cite

Text

Shi et al. "Prompt Learning with Quaternion Networks." International Conference on Learning Representations, 2024.

Markdown

[Shi et al. "Prompt Learning with Quaternion Networks." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/shi2024iclr-prompt/)

BibTeX

@inproceedings{shi2024iclr-prompt,
  title     = {{Prompt Learning with Quaternion Networks}},
  author    = {Shi, Boya and Xu, Zhengqin and Jia, Shuai and Ma, Chao},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/shi2024iclr-prompt/}
}