Debiased Distillation for Consistency Regularization
Abstract
Knowledge distillation transfers "dark knowledge" from a large teacher model to a smaller student model, yielding a highly efficient network. To improve network's generalization ability, existing works use a larger temperature coefficient for knowledge distillation. Nevertheless, these methods may lower the target category's confidence and lead to ambiguous recognition of similar samples. To mitigate this issue, some studies introduce intra-batch distillation to reduce prediction discrepancy. However, these methods overlook the inconsistency between background information and the target category, which may increase prediction bias due to noise disturbance. Additionally, label imbalance from random sampling and batch size can undermine network generalization reliability. To tackle these challenges, we propose a simple yet effective Intra-class Knowledge Distillation (IKD) method that facilitates knowledge sharing within the same class to ensure consistent predictions. First, we initialize the matrix and the vector to store logits and class counts provided by the teacher, respectively. Then, in the first epoch, we calculate the sum of logits and sample counts per class and perform KD to prevent knowledge omission. Finally, in subsequent training, we update the matrix to obtain the average logits and compute the KL divergence between the student's output and the updated matrix according to the label index. This process ensures intra-class consistency and improves the student's performance. Furthermore, this method theoretically reduces prediction bias by ensuring intra-class consistency. Extensive experiments on the CIFAR-100, ImageNet-1K, and Tiny-ImageNet datasets validate the superiority of IKD.
Cite
Text
Wang et al. "Debiased Distillation for Consistency Regularization." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32840Markdown
[Wang et al. "Debiased Distillation for Consistency Regularization." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-debiased/) doi:10.1609/AAAI.V39I8.32840BibTeX
@inproceedings{wang2025aaai-debiased,
title = {{Debiased Distillation for Consistency Regularization}},
author = {Wang, Lu and Xu, Liuchi and Yang, Xiong and Huang, Zhenhua and Cheng, Jun},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {7799-7807},
doi = {10.1609/AAAI.V39I8.32840},
url = {https://mlanthology.org/aaai/2025/wang2025aaai-debiased/}
}