Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning

Abstract

Code generation is a crucial research area in the field of artificial intelligence, holding the potential to revolutionize software development and streamline programming processes. However, generating the high-performance code, which need to be executed in a shorter time for the low-latency scenario, remains a formidable challenge. Existing methods often struggle to account for the irregularity of input sparse data in sparse programs and the need for domain-specific architectural knowledge, leading to sub-optimal performance. To tackle these issues, we propose the SparseRL framework. SparseRL leverages deep reinforcement learning, treating a pre-trained language model as a stochastic policy. It takes the row and column indices of non-zero elements in the sparse matrix as input and generates CUDA code as output for sparse matrix operations. We also introduce a domain-specific code generation mechanism for the dynamic input, a sinusoidal embedding technique tailored for sparse matrices, and a hierarchical reward function that considers both code correctness and execution efficiency. Experimental results demonstrate SparseRL achieves state-of-the-art performance. In sparse matrix-vector multiplication (SpMV) tasks, it improves the compilation rate by 20% compared to existing methods, and the generated code runs 30% faster on average. For sparse matrix-dense matrix multiplication (SpMM) tasks, SparseRL also shows significant performance gains. These results highlight the effectiveness of SparseRL in generating high-performance CUDA code for sparse matrix operations.

Cite

Text

Wang et al. "Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-mastering/)

BibTeX

@inproceedings{wang2026iclr-mastering,
  title     = {{Mastering Sparse CUDA Generation Through Pretrained Models and Deep Reinforcement Learning}},
  author    = {Wang, Yaoyu and Dai, Hankun and Yang, Zhidong and Xiao, Junmin and Tan, Guangming},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-mastering/}
}