Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model

Mengjiao Wang, Rujie Liu, Hajime Nada, Narishige Abe, Hidetsugu Uchida, Tomoaki Matsunami

ICCVW 2019 pp. 2655-2661

doi:10.1109/ICCVW.2019.00324 /iccvw/2019/wang2019iccvw-improved/

Abstract

Low resolution (LR) face recognition (FR) is a challenging, yet common problem for FR task, especially for surveillance scenario. The issue addressed here is not just to build a LR-FR model, more importantly to make it run fast. Here, the knowledge distillation method is adopted for our task, where the teacher's knowledge can be 'distilled' into a small student model by guiding its training process. For LRFR task, the original knowledge distillation scheme would update the teacher's weights first by tuning it using LR augmented train set, and then the student model is trained using same train set under updated teacher's guidance. The problem of this method is that the weights tuning of large teacher model is time-consuming, especially for large-scale dataset. In this paper, we proposed an improved scheme to enable us to avoid the teacher retraining and still be able to train the small model for LR-FR task. Here, different from the original scheme, the train sets for teacher and student model become different, where the train set for teacher model keeps unchanged and the one student is LR augmented. Therefore, it becomes unnecessary to update teacher model any more since the train set is the unchanged. Only the small student model needs to be trained under the original teacher's guidance. This can speed up the whole training process, especially for large-scale dataset. The different train sets for teacher and student will increase the data distribution discrepancy. To solve this problem, we constrained the multikernel maximum mean discrepancy between outputs to reduce this influence. Experimental results show our method can accelerate the training process by about 5 times, while preserving the accuracy. Our student model has same level coresponding author: [email protected] with respect to state-of-art accuracy on LFW and SCFace. It can achieve 3x acceleration comparing to teacher model and only takes 35ms to run on a CPU.

ICCVW Semantic Scholar

Cite

Text

Wang et al. "Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00324

Markdown

[Wang et al. "Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/wang2019iccvw-improved/) doi:10.1109/ICCVW.2019.00324

BibTeX

@inproceedings{wang2019iccvw-improved,
  title     = {{Improved Knowledge Distillation for Training Fast Low Resolution Face Recognition Model}},
  author    = {Wang, Mengjiao and Liu, Rujie and Nada, Hajime and Abe, Narishige and Uchida, Hidetsugu and Matsunami, Tomoaki},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {2655-2661},
  doi       = {10.1109/ICCVW.2019.00324},
  url       = {https://mlanthology.org/iccvw/2019/wang2019iccvw-improved/}
}