Large Scale Dataset Distillation with Domain Shift

Noel Loo, Alaa Maalouf, Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus

ICML 2024 pp. 32759-32780

/icml/2024/loo2024icml-large/

Abstract

Dataset Distillation seeks to summarize a large dataset by generating a reduced set of synthetic samples. While there has been much success at distilling small datasets such as CIFAR-10 on smaller neural architectures, Dataset Distillation methods fail to scale to larger high-resolution datasets and architectures. In this work, we introduce Dataset Distillation with Domain Shift (D3S), a scalable distillation algorithm, made by reframing the dataset distillation problem as a domain shift one. In doing so, we derive a universal bound on the distillation loss, and provide a method for efficiently approximately optimizing it. We achieve state-of-the-art results on Tiny-ImageNet, ImageNet-1k, and ImageNet-21K over a variety of recently proposed baselines, including high cross-architecture generalization. Additionally, our ablation studies provide lessons on the importance of validation-time hyperparameters on distillation performance, motivating the need for standardization.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Loo et al. "Large Scale Dataset Distillation with Domain Shift." International Conference on Machine Learning, 2024.

Markdown

[Loo et al. "Large Scale Dataset Distillation with Domain Shift." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/loo2024icml-large/)

BibTeX

@inproceedings{loo2024icml-large,
  title     = {{Large Scale Dataset Distillation with Domain Shift}},
  author    = {Loo, Noel and Maalouf, Alaa and Hasani, Ramin and Lechner, Mathias and Amini, Alexander and Rus, Daniela},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {32759-32780},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/loo2024icml-large/}
}