DEPrune: Depth-Wise Separable Convolution Pruning for Maximizing GPU Parallelism
Abstract
Depth-wise Separable Convolution (DSConv) has a powerful representation even with fewer parameters and computation, leading to its adoption by almost all of the state-of-the-art CNN models. DSConv models are already compact making it hard to apply pruning, and there are few previous pruning techniques that target depth-wise convolution (DW-conv).In this paper, we present Depth-wise Separable Convolution Pruning (DEPrune), a novel pruning method applied to both point-wise and depth-wise convolutions. DEPrune is optimized by analyzing the computation of DSConv on GPUs.DEPrune employs a fine-grained pruning approach, yet it achieves the structured sparsity typically absent in fine-grained pruning, enabling practical hardware acceleration. Moreover, this method maintains a high pruning ratio without causing any accuracy drop.We additionally represent techniques that further enhance DEPrune performance: 1) balanced workload tuning (BWT), and 2) hardware-aware sparsity recalibration (HSR).Experiment results show that DEPrune achieves up to $3.74\times$ practical speedup in DSConv inference on GPUs while maintaining the accuracy of EfficientNet-B0 on ImageNet.
Cite
Text
Park et al. "DEPrune: Depth-Wise Separable Convolution Pruning for Maximizing GPU Parallelism." Neural Information Processing Systems, 2024. doi:10.52202/079017-3394Markdown
[Park et al. "DEPrune: Depth-Wise Separable Convolution Pruning for Maximizing GPU Parallelism." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/park2024neurips-deprune/) doi:10.52202/079017-3394BibTeX
@inproceedings{park2024neurips-deprune,
title = {{DEPrune: Depth-Wise Separable Convolution Pruning for Maximizing GPU Parallelism}},
author = {Park, Cheonjun and Park, Mincheol and Moon, Hyunchan and Yoon, Myung Kuk and Go, Seokjin and Kim, Suhyun and Ro, Won Woo},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3394},
url = {https://mlanthology.org/neurips/2024/park2024neurips-deprune/}
}