KernelBench: Can LLMs Write Efficient GPU Kernels?

Abstract

Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce KernelBench, an open-source framework for evaluating LMs’ ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making progress on the introduced benchmark directly translates to faster practical kernels. We introduce a new evaluation metric $\text{fast}_p$, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold $p$ over baseline. Our experiments across various state-of-the-art models and test-time methods show that frontier reasoning models perform the best out of the box but still fall short overall, matching the PyTorch baseline in less than 20% of the cases. While we show that results can improve by leveraging execution and profiling feedback during iterative refinement, KernelBench remains a challenging benchmark, with its difficulty increasing as we raise speedup threshold $p$.

Cite

Text

Ouyang et al. "KernelBench: Can LLMs Write Efficient GPU Kernels?." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Ouyang et al. "KernelBench: Can LLMs Write Efficient GPU Kernels?." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ouyang2025icml-kernelbench/)

BibTeX

@inproceedings{ouyang2025icml-kernelbench,
  title     = {{KernelBench: Can LLMs Write Efficient GPU Kernels?}},
  author    = {Ouyang, Anne and Guo, Simon and Arora, Simran and Zhang, Alex L and Hu, William and Re, Christopher and Mirhoseini, Azalia},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {47356-47415},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/ouyang2025icml-kernelbench/}
}