What Mechanisms Does Knowledge Distillation Distill?

Abstract

Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is unclear if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of `knowledge' to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher's learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Wu et al. "What Mechanisms Does Knowledge Distillation Distill?." NeurIPS 2023 Workshops: UniReps, 2023.

Markdown

[Wu et al. "What Mechanisms Does Knowledge Distillation Distill?." NeurIPS 2023 Workshops: UniReps, 2023.](https://mlanthology.org/neuripsw/2023/wu2023neuripsw-mechanisms/)

BibTeX

@inproceedings{wu2023neuripsw-mechanisms,
  title     = {{What Mechanisms Does Knowledge Distillation Distill?}},
  author    = {Wu, Cindy and Lubana, Ekdeep Singh and Mlodozeniec, Bruno Kacper and Kirk, Robert and Krueger, David},
  booktitle = {NeurIPS 2023 Workshops: UniReps},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/wu2023neuripsw-mechanisms/}
}