Differentiable Architecture Compression

Abstract

In many learning situations, resources at inference time are significantly more constrained than resources at training time. This paper studies a general paradigm, called Differentiable ARchitecture Compression (DARC), that combines model compression and architecture search to learn models that are resource-efficient at inference time. Given a resource-intensive base architecture, DARC utilizes the training data to learn which sub-components can be replaced by cheaper alternatives. The high-level technique can be applied to any neural architecture, and we report experiments on state-of-the-art convolutional neural networks for image classification. For a WideResNet with 97.2% accuracy on CIFAR-10, we improve single-sample inference speed by 2.28X and memory footprint by 5.64X, with no accuracy loss. For a ResNet with 79.15% Top-1 accuracy on ImageNet, we improve batch inference speed by 1.29X and memory footprint by 3.57X with 1% accuracy loss. We also give theoretical Rademacher complexity bounds in simplified cases, showing how DARC avoids over-fitting despite over-parameterization.

Cite

Text

Singh et al. "Differentiable Architecture Compression." International Conference on Learning Representations, 2020.

Markdown

[Singh et al. "Differentiable Architecture Compression." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/singh2020iclr-differentiable/)

BibTeX

@inproceedings{singh2020iclr-differentiable,
  title     = {{Differentiable Architecture Compression}},
  author    = {Singh, Shashank and Khetan, Ashish and Karnin, Zohar},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/singh2020iclr-differentiable/}
}