Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction

Abstract

Existing methods to fine-tune LLMs like Adapter Prefix-tuning and LoRA which introduce extra modules or additional input sequences to inject new skills or knowledge may compromise the innate abilities of LLMs. In this paper we propose LLaMA-Excitor a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets. Furthermore we unify the modeling of multi-modal tuning and language-only tuning extending LLaMA-Excitor to a powerful visual instruction follower without the need for complex multi-modal alignment. Our proposed approach is evaluated in language-only and multi-modal tuning experimental scenarios. Notably LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement (+6%) on the MMLU benchmark. In the visual instruction tuning we achieve a new state-of-the-art image captioning performance of 157.5 CIDEr on MSCOCO and a comparable performance (88.39%) on ScienceQA to cutting-edge models with more parameters and extensive vision-language pertaining.

Cite

Text

Zou et al. "Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01336

Markdown

[Zou et al. "Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zou2024cvpr-llamaexcitor/) doi:10.1109/CVPR52733.2024.01336

BibTeX

@inproceedings{zou2024cvpr-llamaexcitor,
  title     = {{Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction}},
  author    = {Zou, Bo and Yang, Chao and Qiao, Yu and Quan, Chengbin and Zhao, Youjian},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {14089-14099},
  doi       = {10.1109/CVPR52733.2024.01336},
  url       = {https://mlanthology.org/cvpr/2024/zou2024cvpr-llamaexcitor/}
}