Sheared Backpropagation for Fine-Tuning Foundation Models

Abstract

Fine-tuning is the process of extending the training of pre-trained models on specific target tasks thereby significantly enhancing their performance across various applications. However fine-tuning often demands large memory consumption posing a challenge for low-memory devices that some previous memory-efficient fine-tuning methods attempted to mitigate by pruning activations for gradient computation albeit at the cost of significant computational overhead from the pruning processes during training. To address these challenges we introduce PreBackRazor a novel activation pruning scheme offering both computational and memory efficiency through a sparsified backpropagation strategy which judiciously avoids unnecessary activation pruning and storage and gradient computation. Before activation pruning our approach samples a probability of selecting a portion of parameters to freeze utilizing a bandit method for updates to prioritize impactful gradients on convergence. During the feed-forward pass each model layer adjusts adaptively based on parameter activation status obviating the need for sparsification and storage of redundant activations for subsequent backpropagation. Benchmarking on fine-tuning foundation models our approach maintains baseline accuracy across diverse tasks yielding over 20% speedup and around 10% memory reduction. Moreover integrating with an advanced CUDA kernel achieves up to 60% speedup without extra memory costs or accuracy loss significantly enhancing the efficiency of fine-tuning foundation models on memory-constrained devices.

Cite

Text

Yu et al. "Sheared Backpropagation for Fine-Tuning Foundation Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00562

Markdown

[Yu et al. "Sheared Backpropagation for Fine-Tuning Foundation Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/yu2024cvpr-sheared/) doi:10.1109/CVPR52733.2024.00562

BibTeX

@inproceedings{yu2024cvpr-sheared,
  title     = {{Sheared Backpropagation for Fine-Tuning Foundation Models}},
  author    = {Yu, Zhiyuan and Shen, Li and Ding, Liang and Tian, Xinmei and Chen, Yixin and Tao, Dacheng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {5883-5892},
  doi       = {10.1109/CVPR52733.2024.00562},
  url       = {https://mlanthology.org/cvpr/2024/yu2024cvpr-sheared/}
}