Knowledge Distillation by On-the-Fly Native Ensemble

Abstract

Knowledge distillation is effective to train the small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a high-capacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) learning strategy for one-stage online distillation. Specifically, ONE only trains a single multi-branch network while simultaneously establishing a strong teacher on-the-fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance of a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.

Cite

Text

Lan et al. "Knowledge Distillation by On-the-Fly Native Ensemble." Neural Information Processing Systems, 2018.

Markdown

[Lan et al. "Knowledge Distillation by On-the-Fly Native Ensemble." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/lan2018neurips-knowledge/)

BibTeX

@inproceedings{lan2018neurips-knowledge,
  title     = {{Knowledge Distillation by On-the-Fly Native Ensemble}},
  author    = {Lan, Xu and Zhu, Xiatian and Gong, Shaogang},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {7517-7527},
  url       = {https://mlanthology.org/neurips/2018/lan2018neurips-knowledge/}
}