Sparse Structure Search for Delta Tuning

Abstract

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of delta tuning (DT), i.e., parameter-efficient tuning, find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing DT modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{Delta} Tuning (S$^3$Delta). Based on a unified framework of various DT methods, S$^3$Delta conducts the differentiable DT structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S$^3$Delta surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99\% fine-tuning performance with 0.01\% trainable parameters. Moreover, the advantage of S$^3$Delta is amplified with extremely low trainable parameters budgets (0.0009\%$\sim$0.01\%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of DT methods. Our codes are publicly available at \url{https://github.com/thunlp/S3Delta}.

Cite

Text

Hu et al. "Sparse Structure Search for Delta Tuning." Neural Information Processing Systems, 2022.

Markdown

[Hu et al. "Sparse Structure Search for Delta Tuning." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/hu2022neurips-sparse/)

BibTeX

@inproceedings{hu2022neurips-sparse,
  title     = {{Sparse Structure Search for Delta Tuning}},
  author    = {Hu, Shengding and Zhang, Zhen and Ding, Ning and Wang, Yadao and Wang, Yasheng and Liu, Zhiyuan and Sun, Maosong},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/hu2022neurips-sparse/}
}