Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction
Abstract
Existing methods to fine-tune LLMs like Adapter Prefix-tuning and LoRA which introduce extra modules or additional input sequences to inject new skills or knowledge may compromise the innate abilities of LLMs. In this paper we propose LLaMA-Excitor a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets. Furthermore we unify the modeling of multi-modal tuning and language-only tuning extending LLaMA-Excitor to a powerful visual instruction follower without the need for complex multi-modal alignment. Our proposed approach is evaluated in language-only and multi-modal tuning experimental scenarios. Notably LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement (+6%) on the MMLU benchmark. In the visual instruction tuning we achieve a new state-of-the-art image captioning performance of 157.5 CIDEr on MSCOCO and a comparable performance (88.39%) on ScienceQA to cutting-edge models with more parameters and extensive vision-language pertaining.
Cite
Text
Zou et al. "Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01336Markdown
[Zou et al. "Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zou2024cvpr-llamaexcitor/) doi:10.1109/CVPR52733.2024.01336BibTeX
@inproceedings{zou2024cvpr-llamaexcitor,
title = {{Llama-Excitor: General Instruction Tuning via Indirect Feature Interaction}},
author = {Zou, Bo and Yang, Chao and Qiao, Yu and Quan, Chengbin and Zhao, Youjian},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {14089-14099},
doi = {10.1109/CVPR52733.2024.01336},
url = {https://mlanthology.org/cvpr/2024/zou2024cvpr-llamaexcitor/}
}