Distributional Dataset Distillation with Subtask Decomposition

Abstract

What does a neural network learn when training from a task-specific dataset? Synthesizing this knowledge is the central idea behind Dataset Distillation, which recent work has shown can be used to compress a large dataset into a small set of input-label pairs (*prototypes*) that capture essential aspects of the original dataset. In this paper, we make the key observation that existing methods that distill into explicit prototypes are often suboptimal, incurring in unexpected storage costs from distilled labels. In response, we propose *Distributional Dataset Distillation* (D3), which encodes the data using minimal sufficient per-class statistics paired with a decoder, allowing for distillation into a compact distributional representation that is more memory-efficient than prototype-based methods. To scale up the process of learning these representations, we propose *Federated distillation*, which decomposes the dataset into subsets, distills them in parallel using sub-task experts, and then re-aggregates them. We thoroughly evaluate our algorithm using a multi-faceted metric, showing that our method achieves state-of-the-art results on TinyImageNet and ImageNet-1K. Specifically, we outperform the prior art by 6.9\% on ImageNet-1K under the equivalence of 2 images per class budget.

Cite

Text

Qin et al. "Distributional Dataset Distillation with Subtask Decomposition." ICLR 2024 Workshops: DPFM, 2024.

Markdown

[Qin et al. "Distributional Dataset Distillation with Subtask Decomposition." ICLR 2024 Workshops: DPFM, 2024.](https://mlanthology.org/iclrw/2024/qin2024iclrw-distributional/)

BibTeX

@inproceedings{qin2024iclrw-distributional,
  title     = {{Distributional Dataset Distillation with Subtask Decomposition}},
  author    = {Qin, Tian and Deng, Zhiwei and Alvarez-Melis, David},
  booktitle = {ICLR 2024 Workshops: DPFM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/qin2024iclrw-distributional/}
}