Model Compression Using Optimal Transport

Abstract

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithms where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how optimal transport-based loss functions can be used for training a student network which encourages learning student network parameters that help bring the distribution of student features closer to that of the teacher features. We present image classification results on CIFAR-100, SVHN and ImageNet and show that the proposed optimal transport loss functions perform comparably to or better than other loss functions.

Cite

Text

Lohit and Jones. "Model Compression Using Optimal Transport." Winter Conference on Applications of Computer Vision, 2022.

Markdown

[Lohit and Jones. "Model Compression Using Optimal Transport." Winter Conference on Applications of Computer Vision, 2022.](https://mlanthology.org/wacv/2022/lohit2022wacv-model/)

BibTeX

@inproceedings{lohit2022wacv-model,
  title     = {{Model Compression Using Optimal Transport}},
  author    = {Lohit, Suhas and Jones, Michael},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2022},
  pages     = {2764-2773},
  url       = {https://mlanthology.org/wacv/2022/lohit2022wacv-model/}
}