Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference
Abstract
Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73× and 1.47×, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.
Cite
Text
Kundu et al. "Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00494Markdown
[Kundu et al. "Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/kundu2023cvprw-making/) doi:10.1109/CVPRW59228.2023.00494BibTeX
@inproceedings{kundu2023cvprw-making,
title = {{Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference}},
author = {Kundu, Souvik and Zhang, Yuke and Chen, Dake and Beerel, Peter A.},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2023},
pages = {4685-4689},
doi = {10.1109/CVPRW59228.2023.00494},
url = {https://mlanthology.org/cvprw/2023/kundu2023cvprw-making/}
}