Variational Dropout Sparsifies Deep Neural Networks

Abstract

We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.

Cite

Text

Molchanov et al. "Variational Dropout Sparsifies Deep Neural Networks." International Conference on Machine Learning, 2017.

Markdown

[Molchanov et al. "Variational Dropout Sparsifies Deep Neural Networks." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/molchanov2017icml-variational/)

BibTeX

@inproceedings{molchanov2017icml-variational,
  title     = {{Variational Dropout Sparsifies Deep Neural Networks}},
  author    = {Molchanov, Dmitry and Ashukha, Arsenii and Vetrov, Dmitry},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {2498-2507},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/molchanov2017icml-variational/}
}