Complementary Sparsity: Accelerating Sparse CNNs with High Accuracy on General-Purpose Computing Platforms

Abstract

Model sparsity is a promising approach to reducing parameters or FLOPs of convolutional neural networks (CNNs). Compared to unstructured or coarse-grained structured sparsity, fine-grained structured sparsity, e.g., N:M sparse pattern, can achieve a better balance between accuracy and efficiency on general computing platforms like CPUs and GPUs. In particular, the 2:4 sparsity can accelerate CNN inference by 2$\times$ speed and with negligible accuracy drop. However, N:M sparsity needs to be supported by GPU within specific hardware circuits and hardly achieves significant speedups on common GPUs. To accelerate CNNs with general-purposed computing resources and simultaneously retain the model accuracy as much as possible, this paper proposes complementary sparsity (CS). CS denotes that only one weight can be retained for weights spaced at the same distance. On the one hand, CS features high mask flexibility, which is naturally favorable to high model accuracy. Moreover, we propose a CS-specific sparse training method to improve CS-based CNNs' accuracy under high parameter sparsities ($>$75\%). On the other hand, CS itself is memory-access balanced and robust to pattern hyperparameters, which can be utilized to speedup CS-based convolution computation on CPUs and common GPUs. We thus propose a CS convolution parallel computing algorithm that adapts to common GPUs without sparse tensor cores. Experimental results show that compared to other sparsity patterns, the proposed CS can achieve the optimal trade-off in terms of accuracy and latency for CPUs and common GPUs, respectively. Codes will be available at https://gitee.com/mindspore/models/tree/master/research/cv/CS.

Cite

Text

Zhao et al. "Complementary Sparsity: Accelerating Sparse CNNs with High Accuracy on General-Purpose Computing Platforms." Transactions on Machine Learning Research, 2023.

Markdown

[Zhao et al. "Complementary Sparsity: Accelerating Sparse CNNs with High Accuracy on General-Purpose Computing Platforms." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/zhao2023tmlr-complementary/)

BibTeX

@article{zhao2023tmlr-complementary,
  title     = {{Complementary Sparsity: Accelerating Sparse CNNs with High Accuracy on General-Purpose Computing Platforms}},
  author    = {Zhao, Kang and Tan, Yijun and Han, Kai and Hu, Ting and Chen, Hanting and Yuan, Tao and Wang, Yunhe and Yao, Jun},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/zhao2023tmlr-complementary/}
}