Progressive Ensemble Distillation: Building Ensembles for Efficient Inference

Abstract

Knowledge distillation is commonly used to compress an ensemble of models into a single model. In this work we study the problem of progressive ensemble distillation: Given a large, pretrained teacher model , we seek to decompose the model into an ensemble of smaller, low-inference cost student models . The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which can be useful for a multitude of applications in efficient inference. Our method, B-DISTIL, uses a boosting procedure that allows function composition based aggregation rules to construct expressive ensembles with similar performance as using much smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across a variety of image, speech, and sensor datasets. Our method comes with strong theoretical guarantees in terms of convergence as well as generalization.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Dennis et al. "Progressive Ensemble Distillation: Building Ensembles for Efficient Inference." Neural Information Processing Systems, 2023.

Markdown

[Dennis et al. "Progressive Ensemble Distillation: Building Ensembles for Efficient Inference." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/dennis2023neurips-progressive/)

BibTeX

@inproceedings{dennis2023neurips-progressive,
  title     = {{Progressive Ensemble Distillation: Building Ensembles for Efficient Inference}},
  author    = {Dennis, Don and Shetty, Abhishek and Sevekari, Anish Prasad and Koishida, Kazuhito and Smith, Virginia},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/dennis2023neurips-progressive/}
}