Progressive Ensemble Distillation: Building Ensembles for Efficient Inference
Abstract
Knowledge distillation is commonly used to compress an ensemble of models into a single model. In this work we study the problem of progressive ensemble distillation: Given a large, pretrained teacher model , we seek to decompose the model into an ensemble of smaller, low-inference cost student models . The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which can be useful for a multitude of applications in efficient inference. Our method, B-DISTIL, uses a boosting procedure that allows function composition based aggregation rules to construct expressive ensembles with similar performance as using much smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across a variety of image, speech, and sensor datasets. Our method comes with strong theoretical guarantees in terms of convergence as well as generalization.
Cite
Text
Dennis et al. "Progressive Ensemble Distillation: Building Ensembles for Efficient Inference." Neural Information Processing Systems, 2023.Markdown
[Dennis et al. "Progressive Ensemble Distillation: Building Ensembles for Efficient Inference." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/dennis2023neurips-progressive/)BibTeX
@inproceedings{dennis2023neurips-progressive,
title = {{Progressive Ensemble Distillation: Building Ensembles for Efficient Inference}},
author = {Dennis, Don and Shetty, Abhishek and Sevekari, Anish Prasad and Koishida, Kazuhito and Smith, Virginia},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/dennis2023neurips-progressive/}
}