Learning Sparse Neural Networks Through Mixture-Distributed Regularization

Chang-Ti Huang, Jun-Cheng Chen, Ja-Ling Wu

CVPRW 2020 pp. 2968-2977

doi:10.1109/CVPRW50498.2020.00355 /cvprw/2020/huang2020cvprw-learning/

Abstract

L0-norm regularization is one of the most efficient approaches to learn a sparse neural network. Due to its discrete nature, differentiable and approximate regularizations based on the concrete distribution [31] or its variants are proposed as alternatives; however, the concrete relaxation suffers from high-variance gradient estimates and is limited to its own concrete distribution. To address these issues, in this paper, we propose a more general framework for relaxing binary gates through mixture distributions. With the proposed method, any mixture pair of distributions converging to δ(0) and δ(1) can be applied to construct smoothed binary gates. We further introduce a reparameterization method for the smoothed binary gates drawn from mixture distributions to enable efficient gradient gradient-based optimization under the proposed deep learning algorithm. Extensive experiments are conducted, and the results show that the proposed approach achieves better performance in terms of pruned architectures, structured sparsity and the reduced number of floating point operations (FLOPs) as compared with other state-of-the-art sparsity-inducing methods.

CVPRW Semantic Scholar

Cite

Text

Huang et al. "Learning Sparse Neural Networks Through Mixture-Distributed Regularization." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00355

Markdown

[Huang et al. "Learning Sparse Neural Networks Through Mixture-Distributed Regularization." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/huang2020cvprw-learning/) doi:10.1109/CVPRW50498.2020.00355

BibTeX

@inproceedings{huang2020cvprw-learning,
  title     = {{Learning Sparse Neural Networks Through Mixture-Distributed Regularization}},
  author    = {Huang, Chang-Ti and Chen, Jun-Cheng and Wu, Ja-Ling},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {2968-2977},
  doi       = {10.1109/CVPRW50498.2020.00355},
  url       = {https://mlanthology.org/cvprw/2020/huang2020cvprw-learning/}
}