QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models

Abstract

As a crucial operator in numerous scientific and engineering computing applications, the automatic optimization of General Matrix Multiplication (GEMM) with full utilization of ever-evolving hardware architectures (e.g. GPUs and RISC-V) is of paramount importance. While Large Language Models (LLMs) can generate functionally correct code for simple tasks, they have yet to produce high-performance code. The key challenge resides in deeply understanding diverse hardware architectures and crafting prompts that effectively unleash the potential of LLMs to generate high-performance code. In this paper, we propose a novel prompt mechanism called QiMeng-GEMM which enables LLMs to comprehend the architectural characteristics of different hardware platforms and automatically search for the optimization combinations for GEMM. The key of QiMeng-GEMM is a set of informative, adaptive, and iterative meta-prompts. Based on this, a searching strategy for optimal combinations of meta-prompts is used to iteratively generate high-performance code. Extensive experiments conducted on 4 leading LLMs, various paradigmatic hardware platforms, and representative matrix dimensions unequivocally demonstrate QiMeng-GEMM’s superior performance in auto-generating optimized GEMM code. Compared to vanilla prompts, our method achieves a performance enhancement of up to 113×. Even when compared to human experts, our method can reach 115% of cuBLAS on NVIDIA GPUs and 211% of OpenBLAS on RISC-V CPUs. Notably, while human experts often take months to optimize GEMM, our approach reduces the development cost by over 240×.

Cite

Text

Zhou et al. "QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34461

Markdown

[Zhou et al. "QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhou2025aaai-qimeng/) doi:10.1609/AAAI.V39I21.34461

BibTeX

@inproceedings{zhou2025aaai-qimeng,
  title     = {{QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models}},
  author    = {Zhou, Qirui and Wen, Yuanbo and Chen, Ruizhi and Gao, Ke and Xiong, Weiqiang and Li, Ling and Guo, Qi and Wu, Yanjun and Chen, Yunji},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {22982-22990},
  doi       = {10.1609/AAAI.V39I21.34461},
  url       = {https://mlanthology.org/aaai/2025/zhou2025aaai-qimeng/}
}