DOT: A Distillation-Oriented Trainer

Borui Zhao, Quan Cui, Renjie Song, Jiajun Liang

ICCV 2023 pp. 6189-6198

doi:10.1109/ICCV51070.2023.00569 /iccv/2023/zhao2023iccv-dot/

Abstract

Knowledge distillation transfers knowledge from a large model to a small one via task and distillation losses. In this paper, we observe a trade-off between task and distillation losses, i.e., introducing distillation loss limits the convergence of task loss. We believe that the trade-off results from the insufficient optimization of distillation loss. The reason is: The teacher has a lower task loss than the student, and a lower distillation loss drives the student more similar to the teacher, then a better-converged task loss could be obtained. To break the trade-off, we propose the Distillation-Oriented Trainer (DOT). DOT separately considers gradients of task and distillation losses, then applies a larger momentum to distillation loss to accelerate its optimization. We empirically prove that DOT breaks the trade-off, i.e., both losses are sufficiently optimized. Extensive experiments validate the superiority of DOT. Notably, DOT achieves a +2.59% accuracy improvement on ImageNet-1k for the ResNet50-MobileNetV1 pair. Conclusively, DOT greatly benefits the student's optimization properties in terms of loss convergence and model generalization. Code will be made publicly available.

PDF ICCV Semantic Scholar

Cite

Text

Zhao et al. "DOT: A Distillation-Oriented Trainer." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00569

Markdown

[Zhao et al. "DOT: A Distillation-Oriented Trainer." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/zhao2023iccv-dot/) doi:10.1109/ICCV51070.2023.00569

BibTeX

@inproceedings{zhao2023iccv-dot,
  title     = {{DOT: A Distillation-Oriented Trainer}},
  author    = {Zhao, Borui and Cui, Quan and Song, Renjie and Liang, Jiajun},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {6189-6198},
  doi       = {10.1109/ICCV51070.2023.00569},
  url       = {https://mlanthology.org/iccv/2023/zhao2023iccv-dot/}
}