Quantifying Knowledge Distillation Using Partial Information Decomposition
Abstract
Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher’s representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performance of a capacity-limited student model. This observation motivates our primary question: What are the information-theoretic limits of knowledge distillation? To this end, we leverage Partial Information Decomposition to quantify and explain the transferred knowledge and knowledge left to distill for a downstream task. We theoretically demonstrate that the task-relevant transferred knowledge is succinctly captured by the measure of redundant information about the task between the teacher and student. We propose a novel multi-level optimization to incorporate redundant information as a regularizer, leading to our framework of Redundant Information Distillation (RID). RID leads to more resilient and effective distillation under nuisance teachers as it succinctly quantifies task-relevant knowledge rather than simply aligning student and teacher representations.
Cite
Text
Dissanayake et al. "Quantifying Knowledge Distillation Using Partial Information Decomposition." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.Markdown
[Dissanayake et al. "Quantifying Knowledge Distillation Using Partial Information Decomposition." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/dissanayake2025aistats-quantifying/)BibTeX
@inproceedings{dissanayake2025aistats-quantifying,
title = {{Quantifying Knowledge Distillation Using Partial Information Decomposition}},
author = {Dissanayake, Pasan and Hamman, Faisal and Halder, Barproda and Sucholutsky, Ilia and Zhang, Qiuyi and Dutta, Sanghamitra},
booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
year = {2025},
pages = {4474-4482},
volume = {258},
url = {https://mlanthology.org/aistats/2025/dissanayake2025aistats-quantifying/}
}