Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

Abstract

Knowledge distillation is commonly used to compress an ensemble of models into a single model. In this work we study the problem of progressive ensemble distillation: Given a large, pretrained teacher model , we seek to decompose the model into an ensemble of smaller, low-inference cost student models . The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which can be useful for a multitude of applications in efficient inference. Our method, B-DISTIL, uses a boosting procedure that allows function composition based aggregation rules to construct expressive ensembles with similar performance as using much smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across a variety of image, speech, and sensor datasets. Our method comes with strong theoretical guarantees in terms of convergence as well as generalization.

Cite

Text

Dennis et al. "Progressive Ensemble Distillation: Building Ensembles for Efficient Inference." Neural Information Processing Systems, 2023.

Markdown

[Dennis et al. "Progressive Ensemble Distillation: Building Ensembles for Efficient Inference." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/dennis2023neurips-progressive/)

BibTeX

@inproceedings{dennis2023neurips-progressive,
  title     = {{Progressive Ensemble Distillation: Building Ensembles for Efficient Inference}},
  author    = {Dennis, Don and Shetty, Abhishek and Sevekari, Anish Prasad and Koishida, Kazuhito and Smith, Virginia},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/dennis2023neurips-progressive/}
}