Markov Knowledge Distillation: Make Nasty Teachers Trained by Self-Undermining Knowledge Distillation Fully Distillable
Abstract
To protect intellectual property of a deep neural network (DNN), two knowledge distillation (KD) related concepts are proposed: distillable DNN and KD-resistant DNN. A DNN is said to be distillable if used as a black-box input-output teacher, it can be distilled by a KD method to train a student model so that the distilled student outperforms the student trained alone with label smoothing (LS student) in terms of accuracy. A DNN is said to be KD-resistant with respect to a specific KD method if used as a black-box input-output teacher, it cannot be distilled by that specific KD method to yield a distilled student outperforming LS student in terms of accuracy. A new KD method called Markov KD (MKD) is further presented. When applied to nasty teachers trained by self-undermining KD, MKD makes those nasty teachers fully distillable, although those nasty teachers are shown to be KD-resistant with respect to state-of-the-art KD methods existing in the literature before our work. When applied to normal teachers, MKD yields distilled students outperforming those trained by KD from the same normal teachers by a large margin. More interestingly, MKD is capable of transferring knowledge from teachers trained in one domain to students trained in another domain.
Cite
Text
Yang and Ye. "Markov Knowledge Distillation: Make Nasty Teachers Trained by Self-Undermining Knowledge Distillation Fully Distillable." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73024-5_10Markdown
[Yang and Ye. "Markov Knowledge Distillation: Make Nasty Teachers Trained by Self-Undermining Knowledge Distillation Fully Distillable." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yang2024eccv-markov/) doi:10.1007/978-3-031-73024-5_10BibTeX
@inproceedings{yang2024eccv-markov,
title = {{Markov Knowledge Distillation: Make Nasty Teachers Trained by Self-Undermining Knowledge Distillation Fully Distillable}},
author = {Yang, En-hui and Ye, Linfeng},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73024-5_10},
url = {https://mlanthology.org/eccv/2024/yang2024eccv-markov/}
}