LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

Abstract

Multiscale signals represent a formidable modelling challenge in Machine Learning as the ubiquitous Mean Squared Error loss function neglects signal behaviour at smaller values. Several scale-equalizing error metrics have been devised to tackle this problem, amongst which the Mean Absolute Percentage Error (MAPE) remains the most widely used due to its simplicity and interpretability. However, by its very definition, MAPE introduces three major issues: asymptotic behaviour at zero-target values, asymptotic gradient behaviour at zero error, and accuracy loss for large signal scales. We address these limitations by proposing the Symmetric Mean Arctangent Squared Percentage Error (SMASPE), which builds up from the Mean Arctangent Absolute Percentage Error (MAAPE) and leverages a mathematically smoother definition along with user-provided signal bounds to extend its functionality. The numerical properties of SMASPE are explored, and its performance is tested in two real-life cases for deterministic and stochastic optimization. The experiments show a clear advantage of the proposed loss function, with an improvement of up to 42% with respect to MAAPE in terms of Mean Absolute Error for deep learning models when appropriate bounds are selected.

Cite

Text

Kim et al. "LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/699

Markdown

[Kim et al. "LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/kim2024ijcai-llmem/) doi:10.24963/ijcai.2024/699

BibTeX

@inproceedings{kim2024ijcai-llmem,
  title     = {{LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs}},
  author    = {Kim, Taeho and Wang, Yanming and Chaturvedi, Vatshank and Gupta, Lokesh and Kim, Seyeon and Kwon, Yongin and Ha, Sangtae},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {6324-6332},
  doi       = {10.24963/ijcai.2024/699},
  url       = {https://mlanthology.org/ijcai/2024/kim2024ijcai-llmem/}
}