Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning

Wang, Yaoyu; Dai, Hankun; Yang, Zhidong; Xiao, Junmin; Tan, Guangming

Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning

Yaoyu Wang, Hankun Dai, Zhidong Yang, Junmin Xiao, Guangming Tan

ICLR 2026

/iclr/2026/wang2026iclr-mastering/

Abstract

Code generation is a crucial research area in the field of artificial intelligence, holding the potential to revolutionize software development and streamline programming processes. However, generating the high-performance code, which need to be executed in a shorter time for the low-latency scenario, remains a formidable challenge. Existing methods often struggle to account for the irregularity of input sparse data in sparse programs and the need for domain-specific architectural knowledge, leading to sub-optimal performance. To tackle these issues, we propose the SparseRL framework. SparseRL leverages deep reinforcement learning, treating a pre-trained language model as a stochastic policy. It takes the row and column indices of non-zero elements in the sparse matrix as input and generates CUDA code as output for sparse matrix operations. We also introduce a domain-specific code generation mechanism for the dynamic input, a sinusoidal embedding technique tailored for sparse matrices, and a hierarchical reward function that considers both code correctness and execution efficiency. Experimental results demonstrate SparseRL achieves state-of-the-art performance. In sparse matrix-vector multiplication (SpMV) tasks, it improves the compilation rate by 20% compared to existing methods, and the generated code runs 30% faster on average. For sparse matrix-dense matrix multiplication (SpMM) tasks, SparseRL also shows significant performance gains. These results highlight the effectiveness of SparseRL in generating high-performance CUDA code for sparse matrix operations.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-mastering/)

BibTeX

@inproceedings{wang2026iclr-mastering,
  title     = {{Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning}},
  author    = {Wang, Yaoyu and Dai, Hankun and Yang, Zhidong and Xiao, Junmin and Tan, Guangming},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-mastering/}
}