Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

Abstract

Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73× and 1.47×, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.

Cite

Text

Kundu et al. "Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00494

Markdown

[Kundu et al. "Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/kundu2023cvprw-making/) doi:10.1109/CVPRW59228.2023.00494

BibTeX

@inproceedings{kundu2023cvprw-making,
  title     = {{Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference}},
  author    = {Kundu, Souvik and Zhang, Yuke and Chen, Dake and Beerel, Peter A.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {4685-4689},
  doi       = {10.1109/CVPRW59228.2023.00494},
  url       = {https://mlanthology.org/cvprw/2023/kundu2023cvprw-making/}
}