Empowering Adaptive Early-Exit Inference with Latency Awareness

Abstract

With the capability of trading accuracy for latency on-the-fly, the technique of adaptive early-exit inference has emerged as a promising line of research to accelerate the deep learning inference. However, studies in this line of research commonly use a group of thresholds to control the accuracy-latency trade-off, where a thorough and general methodology on how to determine these thresholds has not been conducted yet, especially with regard to the common requirements of average inference latency. To address this issue and enable latency-aware adaptive early-exit inference, in the present paper, we approximately formulate the threshold determination problem of finding the accuracy-maximum threshold setting that meets a given average latency requirement, and then propose a threshold determination method to tackle our formulated non-convex problem. Theoretically, we prove that, for certain parameter settings, our method finds an approximate stationary point of the formulated problem. Empirically, on top of various models across multiple datasets (CIFAR-10, CIFAR-100, ImageNet and two time-series datasets), we show that our method can well handle the average latency requirements, and consistently finds good threshold settings in negligible time.

Cite

Text

Tan et al. "Empowering Adaptive Early-Exit Inference with Latency Awareness." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I11.17181

Markdown

[Tan et al. "Empowering Adaptive Early-Exit Inference with Latency Awareness." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/tan2021aaai-empowering/) doi:10.1609/AAAI.V35I11.17181

BibTeX

@inproceedings{tan2021aaai-empowering,
  title     = {{Empowering Adaptive Early-Exit Inference with Latency Awareness}},
  author    = {Tan, Xinrui and Li, Hongjia and Wang, Liming and Huang, Xueqing and Xu, Zhen},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {9825-9833},
  doi       = {10.1609/AAAI.V35I11.17181},
  url       = {https://mlanthology.org/aaai/2021/tan2021aaai-empowering/}
}