Sparse Is Enough in Fine-Tuning Pre-Trained Large Language Models

Abstract

With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. $\textbf{P}$arameter-$\textbf{E}$fficient $\textbf{F}$ine-$\textbf{T}$uning(PEFT) methods have been proposed for low-cost adaptation. Although PEFT has demonstrated effectiveness and been widely applied, the underlying principles are still unclear. In this paper, we adopt the PAC-Bayesian generalization error bound, viewing pre-training as a shift of prior distribution which leads to a tighter bound for generalization error. We validate this shift from the perspectives of oscillations in the loss landscape and the quasi-sparsity in gradient distribution. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named $\textbf{S}$parse $\textbf{I}$ncrement $\textbf{F}$ine-$\textbf{T}$uning(SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.

Cite

Text

Song et al. "Sparse Is Enough in Fine-Tuning Pre-Trained Large Language Models." International Conference on Machine Learning, 2024.

Markdown

[Song et al. "Sparse Is Enough in Fine-Tuning Pre-Trained Large Language Models." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/song2024icml-sparse/)

BibTeX

@inproceedings{song2024icml-sparse,
  title     = {{Sparse Is Enough in Fine-Tuning Pre-Trained Large Language Models}},
  author    = {Song, Weixi and Li, Zuchao and Zhang, Lefei and Zhao, Hai and Du, Bo},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {46121-46135},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/song2024icml-sparse/}
}