Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving
Abstract
Low-Rank Adaptation (LoRA) has become increasingly popular for efficiently fine-tuning large language models (LLMs) with minimal resources. However, traditional methods that serve multiple LoRA models independently result in redundant computation and low GPU utilization. This paper addresses these inefficiencies by introducing Dynamic Operator Optimization (Dop), an advanced automated optimization technique designed to dynamically optimize the Segmented Gather Matrix-Vector Multiplication (SGMV) operator based on specific scenarios. SGMV's unique design enables batching GPU operations for different LoRA models, significantly improving computational efficiency. The Dop approach leverages a Search Space Constructor to create a hierarchical search space, dividing the program space into high-level structural sketches and low-level implementation details, ensuring diversity and flexibility in operator implementation. Furthermore, an Optimization Engine refines these implementations using evolutionary search, guided by a cost model that estimates program performance. This iterative optimization process ensures that SGMV implementations can dynamically adapt to different scenarios to maintain high performance. We demonstrate that Dop can improve throughput by 1.30-1.46 times in a SOTA multi-tenant LoRA serving.
Cite
Text
Zhou et al. "Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34453Markdown
[Zhou et al. "Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhou2025aaai-dynamic-a/) doi:10.1609/AAAI.V39I21.34453BibTeX
@inproceedings{zhou2025aaai-dynamic-a,
title = {{Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving}},
author = {Zhou, Changhai and Zhou, Yuhua and Zhang, Shiyang and Wang, Yibin and Liu, Zekai},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {22910-22918},
doi = {10.1609/AAAI.V39I21.34453},
url = {https://mlanthology.org/aaai/2025/zhou2025aaai-dynamic-a/}
}