Class Incremental Learning with Multi-Teacher Distillation
Abstract
Distillation strategies are currently the primary approaches for mitigating forgetting in class incremental learning (CIL). Existing methods generally inherit previous knowledge from a single teacher. However teachers with different mechanisms are talented at different tasks and inheriting diverse knowledge from them can enhance compatibility with new knowledge. In this paper we propose the MTD method to find multiple diverse teachers for CIL. Specifically we adopt weight permutation feature perturbation and diversity regularization techniques to ensure diverse mechanisms in teachers. To reduce time and memory consumption each teacher is represented as a small branch in the model. We adapt existing CIL distillation strategies with MTD and extensive experiments on CIFAR-100 ImageNet-100 and ImageNet-1000 show significant performance improvement. Our code is available at https://github.com/HaitaoWen/CLearning.
Cite
Text
Wen et al. "Class Incremental Learning with Multi-Teacher Distillation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02687Markdown
[Wen et al. "Class Incremental Learning with Multi-Teacher Distillation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wen2024cvpr-class/) doi:10.1109/CVPR52733.2024.02687BibTeX
@inproceedings{wen2024cvpr-class,
title = {{Class Incremental Learning with Multi-Teacher Distillation}},
author = {Wen, Haitao and Pan, Lili and Dai, Yu and Qiu, Heqian and Wang, Lanxiao and Wu, Qingbo and Li, Hongliang},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {28443-28452},
doi = {10.1109/CVPR52733.2024.02687},
url = {https://mlanthology.org/cvpr/2024/wen2024cvpr-class/}
}