AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution

Abstract

Deep learning frameworks optimize the computation graphs and intra-operator computations to boost the inference performance on GPUs, while inter-operator parallelism is usually ignored. In this paper, a unified framework, AutoGraph, is proposed to obtain highly optimized computation graphs in favor of parallel executions of GPU kernels. A novel dynamic programming algorithm, combined with backtracking search, is adopted to explore the optimal graph optimization solution, with the fast performance estimation from the mixed critical path cost. Accurate runtime information based on GPU Multi-Stream launched with CUDA Graph is utilized to determine the convergence of the optimization. Experimental results demonstrate that our method achieves up to 3.47x speedup over existing graph optimization methods. Moreover, AutoGraph outperforms state-of-the-art parallel kernel launch frameworks by up to 1.26x.

Cite

Text

Zhao et al. "AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I9.26343

Markdown

[Zhao et al. "AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/zhao2023aaai-autograph/) doi:10.1609/AAAI.V37I9.26343

BibTeX

@inproceedings{zhao2023aaai-autograph,
  title     = {{AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution}},
  author    = {Zhao, Yuxuan and Sun, Qi and He, Zhuolun and Bai, Yang and Yu, Bei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {11354-11362},
  doi       = {10.1609/AAAI.V37I9.26343},
  url       = {https://mlanthology.org/aaai/2023/zhao2023aaai-autograph/}
}