AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models
Abstract
Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of these transformer-based models continues to scale up, model distillation becomes extremely important in real-world deployments, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy when confronted with a large capacity gap between the teacher and the student, e.g, 10× compression rate. In this paper, we present a novel approach named Automatic Multi-step Distillation (AMD) for large-scale vision model compression. In particular, our distillation process unfolds across multiple steps. Initially, the teacher undergoes distillation to form an intermediate teacher-assistant model, which is subsequently distilled further to the student. An efficient and effective optimization framework is introduced to automatically identify the optimal teacher-assistant that leads to the maximal student performance. We conduct extensive experiments on multiple image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. The findings consistently reveal that AMD outperforms several established baselines, paving a path for future knowledge distillation methods on large-scale vision models.
Cite
Text
Han et al. "AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73650-6_25Markdown
[Han et al. "AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/han2024eccv-amd/) doi:10.1007/978-3-031-73650-6_25BibTeX
@inproceedings{han2024eccv-amd,
title = {{AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models}},
author = {Han, Cheng and Wang, Qifan and Dianat, Sohail A and Rabbani, Majid and Rao, Raghuveer and Fang, Yi and Guan, Qiang and Huang, Lifu and Liu, Dongfang},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73650-6_25},
url = {https://mlanthology.org/eccv/2024/han2024eccv-amd/}
}