AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models

Abstract

Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of these transformer-based models continues to scale up, model distillation becomes extremely important in real-world deployments, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy when confronted with a large capacity gap between the teacher and the student, e.g, 10× compression rate. In this paper, we present a novel approach named Automatic Multi-step Distillation (AMD) for large-scale vision model compression. In particular, our distillation process unfolds across multiple steps. Initially, the teacher undergoes distillation to form an intermediate teacher-assistant model, which is subsequently distilled further to the student. An efficient and effective optimization framework is introduced to automatically identify the optimal teacher-assistant that leads to the maximal student performance. We conduct extensive experiments on multiple image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. The findings consistently reveal that AMD outperforms several established baselines, paving a path for future knowledge distillation methods on large-scale vision models.

Cite

Text

Han et al. "AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73650-6_25

Markdown

[Han et al. "AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/han2024eccv-amd/) doi:10.1007/978-3-031-73650-6_25

BibTeX

@inproceedings{han2024eccv-amd,
  title     = {{AMD: Automatic Multi-Step Distillation of Large-Scale Vision Models}},
  author    = {Han, Cheng and Wang, Qifan and Dianat, Sohail A and Rabbani, Majid and Rao, Raghuveer and Fang, Yi and Guan, Qiang and Huang, Lifu and Liu, Dongfang},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73650-6_25},
  url       = {https://mlanthology.org/eccv/2024/han2024eccv-amd/}
}