Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Abstract

We study the problem of progressive distillation: Given a large, pretrained teacher model $g$, we seek to decompose the model into smaller, low-inference cost student models $f_i$, such that progressively evaluating additional models in this ensemble results in strict improvements over previous predictions. For user-facing inference applications, this allows us to flexibly trade accuracy for inference latency at runtime. We develop a boosting based algorithm, B-DISTIL, for progressive distillation, and demonstrate its effectiveness on standard datasets.

Cite

Text

Dennis et al. "Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime." ICML 2023 Workshops: ES-FoMO, 2023.

Markdown

[Dennis et al. "Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/dennis2023icmlw-progressive/)

BibTeX

@inproceedings{dennis2023icmlw-progressive,
  title     = {{Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime}},
  author    = {Dennis, Don and Shetty, Abhishek and Sevekari, Anish and Koishida, Kazuhito and Smith, Virginia},
  booktitle = {ICML 2023 Workshops: ES-FoMO},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/dennis2023icmlw-progressive/}
}