Data Distillation: A Survey

Abstract

The popularity of deep learning has led to the curation of a vast number of massive and multifarious datasets. Despite having close-to-human performance on individual tasks, training parameter-hungry models on large datasets poses multi-faceted problems such as (a) high model-training time; (b) slow research iteration; and (c) poor eco-sustainability. As an alternative, data distillation approaches aim to synthesize terse data summaries, which can serve as effective drop-in replacements of the original dataset for scenarios like model training, inference, architecture search, etc. In this survey, we present a formal framework for data distillation, along with providing a detailed taxonomy of existing approaches. Additionally, we cover data distillation approaches for different data modalities, namely images, graphs, and user-item interactions (recommender systems), while also identifying current challenges and future research directions.

Cite

Text

Sachdeva and McAuley. "Data Distillation: A Survey." Transactions on Machine Learning Research, 2023.

Markdown

[Sachdeva and McAuley. "Data Distillation: A Survey." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/sachdeva2023tmlr-data/)

BibTeX

@article{sachdeva2023tmlr-data,
  title     = {{Data Distillation: A Survey}},
  author    = {Sachdeva, Noveen and McAuley, Julian},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/sachdeva2023tmlr-data/}
}