Grounding and Enhancing Informativeness and Utility in Dataset Distillation

Abstract

Dataset Distillation (DD) seeks to create a compact dataset from a large, real-world dataset. While recent methods often rely on heuristic approaches to balance efficiency and quality, the fundamental relationship between original and synthetic data remains underexplored. This paper revisits knowledge distillation-based dataset distillation within a solid theoretical framework. We introduce the concepts of Informativeness and Utility, capturing crucial information within a sample and essential samples in the training set, respectively. Building on these principles, we define \textit{optimal dataset distillation} mathematically. We then present InfoUtil, a framework that balances informativeness and utility in synthesizing the distilled dataset. InfoUtil incorporates two key components: (1) game-theoretic informativeness maximization using Shapley Value attribution to extract key information from samples, and (2) principled utility maximization by selecting globally influential samples based on Gradient Norm. These components ensure that the distilled dataset is both informative and utility-optimized. Experiments demonstrate that our method achieves a 6.1\% performance improvement over the previous state-of-the-art approach on ImageNet-1K dataset using ResNet-18.

Cite

Text

Wang et al. "Grounding and Enhancing Informativeness and Utility in Dataset Distillation." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Grounding and Enhancing Informativeness and Utility in Dataset Distillation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-grounding/)

BibTeX

@inproceedings{wang2026iclr-grounding,
  title     = {{Grounding and Enhancing Informativeness and Utility in Dataset Distillation}},
  author    = {Wang, Shaobo and Yang, Yantai and Chen, Guo and Li, Peiru and Li, Kaixin and Zhou, Yufa and Chen, Zhaorun and Zhang, Linfeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-grounding/}
}