Selecting Large Language Model to Fine-Tune via Rectified Scaling Law

Abstract

The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with scaling laws. Unlike pre-training, We find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing scaling laws fail to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our rectified scaling law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption.

Cite

Text

Lin et al. "Selecting Large Language Model to Fine-Tune via Rectified Scaling Law." ICLR 2024 Workshops: ME-FoMo, 2024.

Markdown

[Lin et al. "Selecting Large Language Model to Fine-Tune via Rectified Scaling Law." ICLR 2024 Workshops: ME-FoMo, 2024.](https://mlanthology.org/iclrw/2024/lin2024iclrw-selecting/)

BibTeX

@inproceedings{lin2024iclrw-selecting,
  title     = {{Selecting Large Language Model to Fine-Tune via Rectified Scaling Law}},
  author    = {Lin, Haowei and Huang, Baizhou and Ye, Haotian and Chen, Qinyu and Wang, Zihao and Li, Sujian and Ma, Jianzhu and Wan, Xiaojun and Zou, James and Liang, Yitao},
  booktitle = {ICLR 2024 Workshops: ME-FoMo},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/lin2024iclrw-selecting/}
}